Download dataset
Navigate to https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102116
復(fù)制:
Paste to terminal:
cd yourdir
vim gsm.lst
右擊粘貼:
保存退出:
:wq
運(yùn)行命令生成SRR文件下載地址列表:
zhoujj 15:55:27 ~/project/06ChenProject/data_GSE102116
$perl /home/zhoujj/github/jjUtil/dl/get_srr_from_gsm.pl gsm.lst > srr.lst
查看生成的列表:
zhoujj 16:00:09 ~/project/06ChenProject/data_GSE102116
$cat srr.lst
GSM2724132 WT_rep1_Day0 SRR5886648
GSM2724133 WT_rep2_Day0 SRR5886652
下載SRR文件:
zhoujj 16:01:24 ~/project/06ChenProject/data_GSE102116
$cut -f 3 srr.lst | while read line; do echo prefetch $line;done > prefetch.sh;
zhoujj 16:01:33 ~/project/06ChenProject/data_GSE102116
$sh prefetch.sh
下載完畢,尋找下載的文件:
zhoujj 16:01:33 ~/project/06ChenProject/data_GSE102116
$ls ~/ncbi/public/sra/
zhoujj 16:01:33 ~/project/06ChenProject/data_GSE102116
$cut -f 3 srr.lst | while read line; do mv ~/ncbi/public/sra/$line.sra .;done;
完成SRR文件下載周拐。
解壓SRA文件:
zhoujj 16:07:15 ~/project/06ChenProject/data_GSE102116
$ls
GSM2724134.html gsm.lst prefetch.sh SRR6880514.sra srr.lst SRX3052556.html work.sh
zhoujj 16:07:17 ~/project/06ChenProject/data_GSE102116
$fastq-dump --split-files ./SRR6880514.sra
Read 14772104 spots for ./SRR6880514.sra
Written 14772104 spots for ./SRR6880514.sra
$ls
GSM2724134.html gsm.lst prefetch.sh SRR6880514_1.fastq SRR6880514_2.fastq SRR6880514.sra srr.lst SRX3052556.html work.sh
SRR6880514_1.fastq is read1
SRR6880514_2.fastq is read2
Run RNA-seq (此處省略)
- 前期準(zhǔn)備:
zhoujj 16:10:52 ~/project/06ChenProject/data_GSE102116
$mkdir rnaseq
zhoujj 16:11:18 ~/project/06ChenProject/data_GSE102116
$cd rnaseq/
zhoujj 16:12:32 ~/project/06ChenProject/data_GSE102116/rnaseq
$cp /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/config.txt .
zhoujj 16:12:58 ~/project/06ChenProject/data_GSE102116/rnaseq
$vim config.txt
check read len:
zhoujj 16:16:30 ~/project/06ChenProject/data_GSE102116/rnaseq
$head ../SRR6880514_2.fastq
@SRR6880514.1 R0209720:515:C7WURACXX:1:1101:2358:2187 length=51
TGGTGAATTTCTCTGATCTAGCATGATAAGTAGAAACATTAAACTGTGATA
+SRR6880514.1 R0209720:515:C7WURACXX:1:1101:2358:2187 length=51
@CCDFFFFHHHHHJJJJJJJJJJJJHIJJJJJJIJJJJJJJJJJJJIIGIG
@SRR6880514.2 R0209720:515:C7WURACXX:1:1101:3400:2240 length=51
TCTCCAGGGCATGTCAGAGATGTTTGCGGCAGCCCCTCCCATCACAGGCCT
+SRR6880514.2 R0209720:515:C7WURACXX:1:1101:3400:2240 length=51
C@CFFFFFFGHDHHIIIGAFCHEEDHI<FHHH1DDFEGFHI<FHIEIIFI<
@SRR6880514.3 R0209720:515:C7WURACXX:1:1101:4539:2113 length=51
TCTTTTTACTTAGGATTGTCTTGGCTATATGGCTCTTTTTTGGTTTCATAT
read len = 51
so check parameters in config.txt
OUTDIR ./
SAMPLE ./samples.lst
# parameter
READLEN 51 # check this parameters
MINLEN 32 # check this parameters, >= 32
THREAD 24
#STANDTYPE FR/FF/RF/RR
# pro
BIN /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/
FASTQC /home/zhoujj/software/FastQC/fastqc
STAR /home/zhoujj/software/STAR/bin/Linux_x86_64_static/STAR
CUFFLINKS /home/zhoujj/software/cufflinks-2.2.1.Linux_x86_64/cufflinks
SAMTOOLS /usr/bin/samtools
HOMER /home/zhoujj/software/homer/bin
# for STAR
GTF /home/zhoujj/data/hg19/hg19/refGene.gtf
SPE human
INDEX /home/zhoujj/data/hg19/star_index
CHROMSIZE /home/zhoujj/data/hg19/hg19.chrom.sizes
- Prepare samples.lst
Find read files:
zhoujj 16:18:22 ~/project/06ChenProject/data_GSE102116/rnaseq
$ll /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_1.fastq
-rw-rw-r-- 1 zhoujj zhoujj 3655968420 Jul 31 16:09 /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_1.fastq
zhoujj 16:18:32 ~/project/06ChenProject/data_GSE102116/rnaseq
$ll /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_2.fastq
-rw-rw-r-- 1 zhoujj zhoujj 3655968420 Jul 31 16:09 /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_2.fastq
samples.lst
WT_rep1_Day0 /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_1.fastq /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_2.fastq
Recheck files:
zhoujj 16:20:52 ~/project/06ChenProject/data_GSE102116/rnaseq
$ll
total 16
drwxrwxr-x 2 zhoujj zhoujj 4096 Jul 31 16:20 ./
drwxrwxr-x 3 zhoujj zhoujj 4096 Jul 31 16:16 ../
-rw-rw-r-- 1 zhoujj zhoujj 989 Jul 31 16:12 config.txt
-rw-rw-r-- 1 zhoujj zhoujj 151 Jul 31 16:20 samples.lst
- Create makefile and run RNA-seq pipeline
Create makefile:
zhoujj 16:20:54 ~/project/06ChenProject/data_GSE102116/rnaseq
$perl /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/rnaseq.pl config.txt
OUTDIR ./
SAMPLE ./samples.lst
READLEN 51
MINLEN 32
THREAD 24
BIN /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/
FASTQC /home/zhoujj/software/FastQC/fastqc
STAR /home/zhoujj/software/STAR/bin/Linux_x86_64_static/STAR
CUFFLINKS /home/zhoujj/software/cufflinks-2.2.1.Linux_x86_64/cufflinks
SAMTOOLS /usr/bin/samtools
HOMER /home/zhoujj/software/homer/bin
GTF /home/zhoujj/data/hg19/hg19/refGene.gtf
SPE human
INDEX /home/zhoujj/data/hg19/star_index
CHROMSIZE /home/zhoujj/data/hg19/hg19.chrom.sizes
Run RNA-seq pipeline:
zhoujj 16:23:58 ~/project/06ChenProject/data_GSE102116/rnaseq
$cut -f 1 samples.lst | while read line; do echo "cd $line && make && cd -";done > run.sh;
zhoujj 16:25:29 ~/project/06ChenProject/data_GSE102116/rnaseq
$sh run.sh
檢查結(jié)果
Check statistics:
zhoujj 16:27:02 ~/project/06ChenProject/data_GSE102116/rnaseq
$cat samples.lst
WT_rep1_Day0 /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_1.fastq /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_2.fastq
zhoujj 16:27:09 ~/project/06ChenProject/data_GSE102116/rnaseq
$perl /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/getMatrics.pl WT_rep1_Day0 > stat.txt
Combine expression profile from multiple samples:
zhoujj 16:27:09 ~/project/06ChenProject/data_GSE102116/rnaseq
$perl /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/combine_cuff_expr.py WT_rep1_Day0/02quantification/genes.fpkm_tracking:WT_rep1_Day0 WT_rep1_Day3/02quantification/genes.fpkm_tracking:WT_rep1_Day3 > gene.expr
Finished.