bulkRNA-seq preprocessing: STAR-RSEM pipeline
- Unix server 기준
- Mouse (Mus Musculus) data
pipeline
Prerequisite
01.FastQC
- 🚨 Java runtime Environment 필요
- 서버라면 -X 옵션으로 들어가야함.
installation
1
2
3
4
5
6
7
wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.12.1.zip
unzip fastqc_v0.12.1.zip
# path 설정
vi ~/.bash_profile # or .bashrc
## 아래에 추가
PATH=/location/of/FastQC:$PATH
Reference genome download
앞서 말했지만 쥐 데이터 기준. Mouse reference, mm10 version 사용
필요한 파일은 다음과 같다.
- fasta file
- gtf file
1
2
3
4
wget http://ftp.ensembl.org/pub/release-98/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.primary_assembly.annotation.gtf.gz
gzip -d *
STAR
installation
1
2
3
4
5
6
7
8
9
10
11
# Get latest STAR source from releases
wget https://github.com/alexdobin/STAR/archive/2.7.10b.tar.gz
tar -xzf 2.7.10b.tar.gz
# Compile
cd STAR-2.7.10b/source
make STAR
# path 설정
vi ~/.bash_profile # or .bashrc
## 아래에 추가
PATH=/location/of/STAR-2.7.10b/source:$PATH
Making reference
!!!!!안해도됨!!!!!
1
2
3
4
5
main_dir=STAR_index/mm10
ref_dir=source/ref_mm10
cd $main_dir
STAR --runThreadN 8 --runMode genomeGenerate --genomeDir $main_dir --genomeFastaFiles $ref_dir/Mus_musculus.GRCm38.dna.primary_assembly.fa --sjdbGTFfile $ref_dir/gencode.vM23.primary_assembly.annotation.gtf
RSEM
installation
1
2
3
4
5
6
7
wget https://github.com/deweylab/RSEM/archive/v1.3.3.tar.gz
tar -xvzf v1.3.3.tar.gz
# path 설정
vi ~/.bash_profile # or .bashrc
## 아래에 추가
PATH=/location/of/STAR-2.7.10b/source:$PATH
Making reference
1
2
3
4
5
6
main_dir=RSEM_index/mm10
ref_dir=source/ref_mm10
cd $main_dir
rsem-prepare-reference --gtf $ref_dir/gencode.vM23.primary_assembly.annotation.gtf --STAR $ref_dir/Mus_musculus.GRCm38.dna.primary_assembly.fa $main_dir --num-threads 8
Sickle (optional)
installation
1
2
3
4
5
6
7
8
wget https://github.com/najoshi/sickle/archive/refs/tags/v1.33.tar.gz
tar -xvzf v1.33.tar.gz
make
# path 설정
vi ~/.bash_profile # or .bashrc
## 아래에 추가
PATH=/location/of/sickle-1.33:$PATH
설치 확인
1
2
source ~/.bash_profile
which fastqc STAR sickle
Run STAR-RSEM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
source /storage/home/subin/.bash_profile
main_dir=/storage2/Project/subin/ColonMeta/RNA/02_Preprocessing
input_dir=/storage2/Project/subin/ColonMeta/RNA/01_RawFiles
STAR_dir=/storage2/Project/source/STAR-master/bin/Linux_x86_64
#STARref_dir=/storage2/Project/source/STAR_index/GRCh38
#RSEM_dir=/storage2/Project/source/RSEM-master/bin
RSEM_ref_dir=/storage2/Project/source/RSEM_index
#sample=P01_N01_RNA
#raw
R1=$input_dir/$raw'_R1_001.fastq.gz'
R2=$input_dir/$raw'_R2_001.fastq.gz'
sickle_out=$main_dir/01.Sickle
rsem_out=$main_dir/02.RSEM
cd $main_dir
echo "!!!!!sickle!!!!!"
sickle pe -f $R1 -r $R2 -q 20 -t sanger -l 101 -o $sickle_out/$sample'_trimmed_1.fq' -p $sickle_out/$sample'_trimmed_2.fq' -s $sickle_out/$sample'_sum.fastq'
echo "!!!!!STAR-RSEM!!!!!"
rsem-calculate-expression --paired-end --output-genome-bam --estimate-rspd --keep-intermediate-files --num-threads $thread --star --star-path $STAR_dir $sickle_out/$sample'_trimmed_1.fq' $sickle_out/$sample'_trimmed_2.fq' $RSEM_ref_dir/GRCh38 $rsem_out/$sample
#--star-gzipped-read-file # if you use .fastq.gz files
This post is licensed under CC BY 4.0 by the author.