目錄
1.Module 1 - Introduction to RNA sequencing
- Installation
- Reference Genomes
- Annotations
- Indexing
- RNA-seq Data
- Pre-Alignment QC
2.Module 2 - RNA-seq Alignment and Visualization
- Adapter Trim
- Alignment
- IGV
- Alignment Visualization
- Alignment QC
3.Module 3 - Expression and Differential Expression
- Expression
- Differential Expression
- DE Visualization
- Kallisto for Reference-Free Abundance Estimation
4.Module 4 - Isoform Discovery and Alternative Expression
- Reference Guided Transcript Assembly
- de novo Transcript Assembly
- Transcript Assembly Merge
- Differential Splicing
- Splicing Visualization
5.Module 5 - De novo transcript reconstruction
- De novo RNA-Seq Assembly and Analysis Using Trinity
6.Module 6 - Functional Annotation of Transcripts
- Functional Annotation of Assembled Transcripts Using Trinotate
4.5 Transcript Assembly Visualization (Splicing Visualization)
Visualizing Results at the Command Line
從“de_novo”模式查看合并后的GTF文件锰瘸。請記住焊夸,這個(gè)合并的GTF文件結(jié)合了UHR和HBR(每個(gè)單獨(dú)的GTF也在前面生成)。
cd denovo
head stringtie_merged.gtf
有關(guān)該文件格式的細(xì)節(jié)阁簸,請查閱以下鏈接
- https://ccb.jhu.edu/software/stringtie/gff.shtml#gffcompare
- http://cole-trapnell-lab.github.io/cufflinks/cuffmerge/index.html
- http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/index.html#transfrag-class-codes
在“de_novo”結(jié)果中渣窜,有多少基因至少有一個(gè)由StringTie組裝的轉(zhuǎn)錄本?
cat stringtie_merged.gtf | perl -ne 'if ($_ =~ /gene_id\s+\"(\S+)\"\;/){print "$1\n"}' | sort | uniq | wc -l
565
有多少基因至少組裝了一個(gè)潛在的新轉(zhuǎn)錄本?
head gffcompare.stringtie_merged.gtf.tmap
grep "j" gffcompare.stringtie_merged.gtf.tmap
grep "j" gffcompare.stringtie_merged.gtf.tmap | cut -f 1 | sort | uniq | wc -l
174
顯示具有最高閱讀支持度的基因間區(qū)域(候選新轉(zhuǎn)錄區(qū)域)的轉(zhuǎn)錄本
cd denovo
grep -w "u" gffcompare.stringtie_merged.gtf.tmap | sort -n -k 10 | column -t
- - u MSTRG.481 MSTRG.481.1 3 0.000000 0.000000 0.000000 260 MSTRG.481.1 -
- - u MSTRG.482 MSTRG.482.1 2 0.000000 0.000000 0.000000 267 MSTRG.482.1 -
- - u MSTRG.54 MSTRG.54.1 2 0.000000 0.000000 0.000000 279 MSTRG.54.1 -
- - u MSTRG.434 MSTRG.434.1 3 0.000000 0.000000 0.000000 281 MSTRG.434.1 -
- - u MSTRG.484 MSTRG.484.1 2 0.000000 0.000000 0.000000 319 MSTRG.484.1 -
- - u MSTRG.3 MSTRG.3.1 2 0.000000 0.000000 0.000000 320 MSTRG.3.1 -
- - u MSTRG.200 MSTRG.200.1 2 0.000000 0.000000 0.000000 344 MSTRG.200.1 -
- - u MSTRG.391 MSTRG.391.1 2 0.000000 0.000000 0.000000 346 MSTRG.391.1 -
- - u MSTRG.2 MSTRG.2.1 2 0.000000 0.000000 0.000000 400 MSTRG.2.1 -
- - u MSTRG.94 MSTRG.94.1 3 0.000000 0.000000 0.000000 424 MSTRG.94.1 -
- - u MSTRG.410 MSTRG.410.1 2 0.000000 0.000000 0.000000 939 MSTRG.410.1 -
使用RegTools來注釋所有的可變剪切
RegTools用于幫助描述單個(gè)外顯子剪接事件呀枢,并幫助識別對基因表達(dá)或剪接模式有直接影響的新剪接事件霉晕。更多細(xì)節(jié)請參考RegTools手冊缕减。
使用RegTools的基本功能來提取可變剪切。每個(gè)bam的bed文件芒珠,它總結(jié)了RNA-seq數(shù)據(jù)中所表示的所有不同的外顯子-外顯子剪接事件桥狡。我們還將使用RegTools對這些連接進(jìn)行注釋,以參考我們的GTF轉(zhuǎn)錄組文件:
cd align
regtools junctions extract -s 0 HBR.bam -o HBR.junctions.bed
head HBR.junctions.bed
regtools junctions annotate HBR.junctions.bed ../chr22_with_ERCC92.fa ../chr22_with_ERCC92.gtf > HBR.junctions.anno.bed
head HBR.junctions.anno.bed
regtools junctions extract -s 0 UHR.bam -o UHR.junctions.bed
head UHR.junctions.bed
regtools junctions annotate UHR.junctions.bed ../chr22_with_ERCC92.fa ../chr22_with_ERCC92.gtf > UHR.junctions.anno.bed
head UHR.junctions.anno.bed
現(xiàn)在從樣本中找出任何可能涉及新外顯子跳躍皱卓、受體位點(diǎn)使用或供體位點(diǎn)使用的連接(相對于參考轉(zhuǎn)錄組GTF)裹芝。
grep -P -w "NDA|A|D" HBR.junctions.anno.bed | perl -ne 'chomp; @l=split("\t",$_); if ($l[4] > 3){print "$_\n"}'
grep -P -w "NDA|A|D" UHR.junctions.anno.bed | perl -ne 'chomp; @l=split("\t",$_); if ($l[4] > 3){print "$_\n"}'
轉(zhuǎn)換成GTF文件查看
為了更容易比較僅ref-only, ref-guided, de novo 的結(jié)果的輸出,我們現(xiàn)在將生成合并后的GTF文件的過濾版本娜汁,我們將刪除轉(zhuǎn)錄本嫂易,除非有證據(jù)表明它們的表達(dá)。
wget https://github.com/griffithlab/rnaseq_tutorial/blob/master/scripts/stringtie_filter_gtf.pl
perl stringtie_filter_gtf.pl --expression_metric=FPKM --result_dirs='HBR_Rep1,HBR_Rep2,HBR_Rep3,UHR_Rep1,UHR_Rep2,UHR_Rep3' --input_gtf_file='../chr22_with_ERCC92.gtf' --filtered_gtf_file='chr22_with_ERCC92.filtered.gtf' --exp_cutoff=0 --min_sample_count=2