Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown
用HISAT, StringTie 和 Ballgown來進行轉(zhuǎn)錄組測序數(shù)據(jù)的表達水平分析
ps:文章省略了數(shù)據(jù)的質(zhì)控贝或,去除污染物术唬,去除接頭等操作围段,直接從序列比對開始
分析流程可以分成4個主要的方面:
?(i)? alignment?of? the? reads?to? the? genome;
?(ii)? assembly? of? thealignments into full-length transcripts;
?(iii) quantification of the expressionlevels of each gene and transcript; and
?(iv) calculation of? the?differences? in? expression?for? all? genes?among? the different?experimental? conditions.
1 比對reads到gene組
2將alignments?組裝成完整的轉(zhuǎn)錄本
3定量每個gene或者轉(zhuǎn)錄本的表達水平
4計算不同實驗條件下所有g(shù)ene表達差異
分析使用的3個軟件的分別作用:
HISAT:alignsRNA-seq reads to a genome and discovers transcript splice sites
HISAT:比對RNA測序的reads到基因組和已知的轉(zhuǎn)錄剪切位點
StringTie:assembles? the?alignments? into? full?and? partial? tran-scripts, creating multiple isoforms asnecessary and estimating ?the expression levels of all genes and transcripts.
StringTie:組裝?alignments到全部或者部分轉(zhuǎn)錄組漓库,生成多個isoforms徘公,計算所有g(shù)ene和transcripts的表達水平
Ballgown:takes thetranscripts and expression levels from StringTie and applies rigorous? statistical?methods? to? determine?which? transcripts are? differentially? expressed?between? two? or?more? experiments.
Ballgown:導(dǎo)入StringTie生成的轉(zhuǎn)錄本以及表達水平結(jié)果,采用嚴格的統(tǒng)計方法來確認在不同實驗條件下差異表達的?transcripts
具體流程圖:
具體流程:
*FASTQC和FASTX toolkit進行原始RNA測序數(shù)據(jù)的質(zhì)控:去除污染物逛犹,去除接頭舱呻,低質(zhì)量的序列
1 用HISAT將樣本的read比對到參考基因組
2 比對結(jié)果傳送到stringtie進行轉(zhuǎn)錄本拼接
3 用stingtie的merge功能將拼接后的轉(zhuǎn)錄本進行融合
(Cufflinks的cuffmerge功能能代替atingtie的merge功能)
4 融合后的轉(zhuǎn)錄本回送到stingtie,重新計算轉(zhuǎn)錄本的豐度
?stringtie:gffcompre確定拼接的轉(zhuǎn)錄本多少匹配到已經(jīng)注釋的gene怯晕,多少是完全新的
5 stingtie提供轉(zhuǎn)錄本的read數(shù)量
? stringtie傳送三類數(shù)據(jù)至ballgown
(i)phenotype data—information about the samples being collected;
(ii)expression data—normalized and un-normalized measures of the amount of eachexon, junction, transcript and gene expressed in each sample;
(iii)genomic information— coordinates giving the location of the exons, introns,transcripts and? genes,? as?well? as? annotation?including? information? such?as gene names.
A 表型數(shù)據(jù):收集的樣本信息
B 表達數(shù)據(jù):標準化或未標準化的內(nèi)顯子潜圃,junction,轉(zhuǎn)錄本舟茶,gene的表達信息
C gene組信息:內(nèi)外顯子轉(zhuǎn)錄本等的位置信息谭期,或者gene名稱等
6 ballgown根據(jù)不同實驗條件計算差異表達gene
ballgown分析流程:
A? ?loading the data into R.
載入由stingtie產(chǎn)生的豐度數(shù)據(jù)和描述樣本的表型信息數(shù)據(jù)到R?
劃重點:確保gene組樣本的id與表型數(shù)據(jù)的id一致
B?inspectthe distribution of abundance estimates for the transcripts.
檢查轉(zhuǎn)錄本豐度估計的分布
劃重點:豐度估計由FPKM表示,每1百萬個map上的reads中map到外顯子的每1K個堿基上的reads個數(shù)
ballgown的stattest功能:直接標記任何已知的干擾因子
C?The result is a table with information on thefeature tested for differential expression
差異表達的特征檢驗
具體的軟件安裝與執(zhí)行代碼吧凉,文章中有具體列出崇堵,這里就不累述。詳細請閱讀文章客燕。
·