SMART-SEQ2特點(diǎn):基因數(shù)量還行,細(xì)胞數(shù)量不太多存谎。
這里是佳奧!讓我們開始吧烦周!
從下載文章數(shù)據(jù)開始做粤。SRA是測(cè)序數(shù)據(jù)下載,Supplementary file是表達(dá)矩陣事秀。
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229
QQ截圖20220830162920.png
QQ截圖20220830163038.png
課程只對(duì)以下數(shù)據(jù)進(jìn)行上游分析:
SRR6791441
SRR6791442
SRR6791443
SRR6791444
SRR6791445
SRR6791446
SRR6791447
SRR6791448
##從創(chuàng)建環(huán)境開始
conda create -n scrna
##添加到環(huán)境
export PATH="$PATH:/home/kaoku/biosoft/sratoolkit/sratoolkit.3.0.0-ubuntu64/bin"
##建立srr.list并下載(下載目錄ncbi/sra)
$ cat srr.list
SRR6791441
SRR6791442
SRR6791443
SRR6791444
SRR6791445
SRR6791446
SRR6791447
SRR6791448
cat srr.list | while read id; do ( prefetch $id & ); done
##數(shù)據(jù)下載完成
SRR6791441.sra SRR6791442.sra SRR6791443.sra SRR6791444.sra SRR6791445.sra SRR6791446.sra SRR6791447.sra SRR6791448.sra srr.list
##下載軟件
conda install -y -c bioconda fastqc multiqc trim-galore subread hisat2
##SRR轉(zhuǎn)fastq文件(單端測(cè)序)彤断,并新建raw_fq目錄
ls /home/kaoku/project/scRNA/srr/*.sra |
while read id
do
fastq-dump -O ./ --gzip --split-3 $id &
done
##hisat2比對(duì)(mm10參考基因組)
index=/home/jianmingzeng/ reference/index/hisat/mm10/genome
ls raw_fq/*.gz |
while read id
do
hisat2 -p 10 -x $index -U $id -S ${id%%.*}.hisat.sam
done
##.sam轉(zhuǎn).bam
ls *.sam | while read id ; do (samtools sort -0 bam -@ 5 -o $(basename ${id} " .sam").bam ${id}); done
##構(gòu)建index
ls *.bam | xargs -i samtools index {}
##count計(jì)數(shù)(需要.gtf文件)
gtf=/home/jianmingzeng/reference/gtf/gencode/gencode.vM12.annotation.gtf
featureCounts -T 5 -t exon -g gene_id -a $gtf -o all.id.txt *.bam 1>counts.id.1og 2>&1 &
##生成表達(dá)矩陣all.id.txt 與GEO數(shù)據(jù)庫中的rawCounts結(jié)果相似
后續(xù)的分析我們就從解讀作者提供的表達(dá)矩陣開始。
我們下一篇再見易迹!