差異表達(dá)分析
接著上篇文章:http://www.reibang.com/p/88511070e2dd
繼續(xù)使用來自pasilla包的示例數(shù)據(jù)
用results函數(shù)提取差異分析結(jié)果,默認(rèn)的包括p-value,padj,log2foldchange拗窃,而且默認(rèn)上述值是針對design里最后一個變量的景鼠,不過用戶也可以自己規(guī)定針對什么變量,只要往results里添加參數(shù)name或者contrasts.
dds <- DESeq(dds)
res <- results(dds)
res
# log2 fold change (MLE): condition treated vs untreated
# Wald test p-value: condition treated vs untreated
# DataFrame with 14599 rows and 6 columns
# baseMean log2FoldChange lfcSE
# <numeric> <numeric> <numeric>
# FBgn0000003 0.171568715207063 1.02601368333522 3.80551160374507
# FBgn0000008 95.1440789963134 0.00215175449141369 0.223883805414937
# FBgn0000014 1.05657219346166 -0.496735199348756 2.16026616643833
# FBgn0000015 0.846723274987709 -1.88276494877056 2.10643527029088
# FBgn0000017 4352.5928987935 -0.240025038615816 0.126024321793908
#如果要明確比較的變量坷衍,可以這樣做
res <- results(dds, name="condition_treated_vs_untreated")
res <- results(dds, contrast=c("condition","treated","untreated"))
Log fold change shrinkage for visualization and ranking
表達(dá)量越低的基因,彼此之間離散程度越大,比如0和10的表達(dá)量就會顯得方差相對很高枪向;但是10010和10000也是差10但是相對表達(dá)量離散程度就很小裙品,為了統(tǒng)計檢驗準(zhǔn)確有效不受count數(shù)太大的影響俗批,這里可以做shrink
resultsNames(dds)#查看參數(shù)
## [1] "Intercept" "condition_treated_vs_untreated"
#選擇要shrink的參數(shù)
resLFC <- lfcShrink(dds, coef="condition_treated_vs_untreated", type="apeglm")
# using 'apeglm' for LFC shrinkage. If used in published research, please cite:
# Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
# sequence count data: removing the noise and preserving large differences.
# Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
resLFC
#可以用as.data.frame方便查看res
Using parallelization
如果實驗的設(shè)計更為復(fù)雜而且又大量樣本,可以考慮并行計算市怎,使用BiocParallel包
Parallelizing DESeq, results, and lfcShrink can be easily accomplished by loading the BiocParallel package, and then setting the following arguments: parallel=TRUE and BPPARAM=MulticoreParam(4)
或者一開始就聲明好4核:
library("BiocParallel")
register(MulticoreParam(4))
#像上述聲明好后岁忘,在function里添加parallel=TRUE選項即可
p-values and adjusted p-values
resOrdered <- res[order(res$pvalue),]#按pvalue從小到大排列
summary(res)
# out of 12359 with nonzero total read count
# adjusted p-value < 0.1
# LFC > 0 (up) : 521, 4.2%
# LFC < 0 (down) : 540, 4.4%
# outliers [1] : 1, 0.0081%
# low counts [2] : 4035, 33%
# (mean count < 7)
# [1] see 'cooksCutoff' argument of ?results
# [2] see 'independentFiltering' argument of ?results
sum(res$padj < 0.1, na.rm=TRUE)
#results函數(shù)參數(shù)是很豐富的,建議?results查看区匠,這里把默認(rèn)pvalue=0.1設(shè)置成0.05
res05 <- results(dds, alpha=0.05)
summary(res05)
# out of 12359 with nonzero total read count
# adjusted p-value < 0.05
# LFC > 0 (up) : 406, 3.3%
# LFC < 0 (down) : 432, 3.5%
# outliers [1] : 1, 0.0081%
# low counts [2] : 3797, 31%
# (mean count < 5)
# [1] see 'cooksCutoff' argument of ?results
# [2] see 'independentFiltering' argument of ?results
sum(res05$padj < 0.05, na.rm=TRUE)
Independent hypothesis weighting
用IHW包對pvalue進(jìn)行過濾干像,提高檢驗功效,通過給假設(shè)賦予權(quán)重進(jìn)行多重檢驗
http://bioconductor.org/packages/release/bioc/html/IHW.html
library("IHW")
resIHW <- results(dds, filterFun=ihw)
summary(resIHW)
# out of 12359 with nonzero total read count
# adjusted p-value < 0.1
# LFC > 0 (up) : 498, 4%
# LFC < 0 (down) : 548, 4.4%
# outliers [1] : 1, 0.0081%
# [1] see 'cooksCutoff' argument of ?results
# see metadata(res)$ihwResult on hypothesis weighting
sum(resIHW$padj < 0.1, na.rm=TRUE)
metadata(resIHW)$ihwResult
# ihwResult object with 14599 hypothesis tests
# Nominal FDR control level: 0.1
# Split into 8 bins, based on an ordinal covariate
#注意驰弄,所有通過DESeq2這個包進(jìn)行的計算結(jié)果都被存儲在了DESeqDataSet或者DESeqResults對象中
#怎么獲取這些結(jié)果以后的推文會討論