Post

bulkRNA-seq preprocessing: STAR-RSEM pipeline

  • Unix server 기준
  • Mouse (Mus Musculus) data

pipeline

Prerequisite

01.FastQC

🔗 babraham bioinfo. FastQC page
🔗 FastQC github page

🚨 Java runtime Environment 필요
서버라면 -X 옵션으로 들어가야함.

installation

1
2
3
4
5
6
7
wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.12.1.zip
unzip fastqc_v0.12.1.zip

# path 설정
vi ~/.bash_profile # or .bashrc
## 아래에 추가
PATH=/location/of/FastQC:$PATH 

Reference genome download

앞서 말했지만 쥐 데이터 기준. Mouse reference, mm10 version 사용

필요한 파일은 다음과 같다.

  • fasta file
  • gtf file

🔗 Gencode Human ver. 다운로드는 여기

  • GTF = Comprehensive gene annotation (ALL)
  • FA file = Genome sequence (GRCh38.p14) 다운 받으면 된다
1
2
3
4
wget http://ftp.ensembl.org/pub/release-98/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.primary_assembly.annotation.gtf.gz

gzip -d *

STAR

STAR github page

installation

1
2
3
4
5
6
7
8
9
10
11
# Get latest STAR source from releases
wget https://github.com/alexdobin/STAR/archive/2.7.10b.tar.gz
tar -xzf 2.7.10b.tar.gz
# Compile
cd STAR-2.7.10b/source
make STAR

# path 설정
vi ~/.bash_profile # or .bashrc
## 아래에 추가
PATH=/location/of/STAR-2.7.10b/source:$PATH 

Making reference

!!!!!안해도됨!!!!! RSEM ref 만들때 STAR넣어주면 같이 해줌

1
2
3
4
5
main_dir=STAR_index/mm10
ref_dir=source/ref_mm10

cd $main_dir
STAR --runThreadN 8 --runMode genomeGenerate --genomeDir $main_dir --genomeFastaFiles $ref_dir/Mus_musculus.GRCm38.dna.primary_assembly.fa --sjdbGTFfile $ref_dir/gencode.vM23.primary_assembly.annotation.gtf

RSEM

RSEM github page

installation

1
2
3
4
5
6
7
wget https://github.com/deweylab/RSEM/archive/v1.3.3.tar.gz
tar -xvzf v1.3.3.tar.gz

# path 설정
vi ~/.bash_profile # or .bashrc
## 아래에 추가
PATH=/location/of/STAR-2.7.10b/source:$PATH 

Making reference

1
2
3
4
5
6
main_dir=RSEM_index/mm10
ref_dir=source/ref_mm10

cd $main_dir

rsem-prepare-reference --gtf $ref_dir/gencode.vM23.primary_assembly.annotation.gtf --STAR $ref_dir/Mus_musculus.GRCm38.dna.primary_assembly.fa $main_dir --num-threads 8

Sickle (optional)

Sickle trimmer github page

installation

1
2
3
4
5
6
7
8
wget https://github.com/najoshi/sickle/archive/refs/tags/v1.33.tar.gz
tar -xvzf v1.33.tar.gz 
make

# path 설정
vi ~/.bash_profile # or .bashrc
## 아래에 추가
PATH=/location/of/sickle-1.33:$PATH 

설치 확인

1
2
source ~/.bash_profile
which fastqc STAR sickle

Run STAR-RSEM

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
source /storage/home/subin/.bash_profile

main_dir=/storage2/Project/subin/ColonMeta/RNA/02_Preprocessing
input_dir=/storage2/Project/subin/ColonMeta/RNA/01_RawFiles
STAR_dir=/storage2/Project/source/STAR-master/bin/Linux_x86_64
#STARref_dir=/storage2/Project/source/STAR_index/GRCh38
#RSEM_dir=/storage2/Project/source/RSEM-master/bin
RSEM_ref_dir=/storage2/Project/source/RSEM_index

#sample=P01_N01_RNA
#raw
R1=$input_dir/$raw'_R1_001.fastq.gz'
R2=$input_dir/$raw'_R2_001.fastq.gz'

sickle_out=$main_dir/01.Sickle
rsem_out=$main_dir/02.RSEM

cd $main_dir

echo "!!!!!sickle!!!!!"
sickle pe -f $R1 -r $R2 -q 20 -t sanger -l 101 -o $sickle_out/$sample'_trimmed_1.fq' -p $sickle_out/$sample'_trimmed_2.fq' -s $sickle_out/$sample'_sum.fastq'

echo "!!!!!STAR-RSEM!!!!!"
rsem-calculate-expression --paired-end --output-genome-bam --estimate-rspd --keep-intermediate-files --num-threads $thread --star --star-path $STAR_dir $sickle_out/$sample'_trimmed_1.fq' $sickle_out/$sample'_trimmed_2.fq' $RSEM_ref_dir/GRCh38 $rsem_out/$sample
#--star-gzipped-read-file # if you use .fastq.gz files 이거 넣을거면 --star 뒤에 $R1 $R2 전에 넣어야 맞게 작동함
This post is licensed under CC BY 4.0 by the author.

© Subin Cho. Some rights reserved.

Using the Chirpy theme for Jekyll.