limma/voom,edgeR祠够,DESeq2分析注意事項(xiàng)压汪,差異分析表達(dá)矩陣與分組信息

給粉絲朋友們帶來了很多理解上的挑戰(zhàn),所以我們開辟專欄慢慢介紹其中的一些概念性的問題古瓤,上一期: 箱線圖的生物學(xué)含義

這一講我們來說一下limma/voom止剖,edgeR,DESeq2落君,轉(zhuǎn)錄組差異分析的三大R包!

差異分析的第一步是要構(gòu)建符合不同模型的R對象穿香,主要包括兩部分的信息:表達(dá)矩陣和分組信息。 這次主要討論一下limma/voom绎速,edgeR皮获,DESeq2是轉(zhuǎn)錄組差異分析的三大R包的表達(dá)矩陣和分組矩陣構(gòu)建,主要針對二分組轉(zhuǎn)錄組數(shù)據(jù)的差異分析纹冤。

一洒宝、limma和edgeR包的表達(dá)矩陣和分組信息

1.limma和edgeR包DEGList對象的構(gòu)建

limma和edgeR包都是由一個(gè)研究團(tuán)隊(duì)開發(fā),方法之間互相繼承萌京。edgeR是專門針對轉(zhuǎn)錄組數(shù)據(jù)開發(fā)的雁歌,limma包最早是用來進(jìn)行芯片數(shù)據(jù)的差異分析,對轉(zhuǎn)錄組數(shù)據(jù)差異分析的功能是后來添加的知残,表達(dá)矩陣的構(gòu)建方法直接使用edgeR包中的DGEList函數(shù)靠瞎。

DEGList函數(shù)的參數(shù)示例:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">DGEList(counts = matrix(0, 0, 0), lib.size = colSums(counts),
norm.factors = rep(1,ncol(counts)), samples = NULL,
group = NULL, genes = NULL, remove.zeros = FALSE)</pre>

使用airway中的轉(zhuǎn)錄組表達(dá)矩陣來演示

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># BiocManager::install(c("airway", "edgeR"))

library(airway)
data(airway)

獲取基因counts矩陣

exprSet <- assay(airway)
exprSet[1:6,1:6]
SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517
ENSG00000000003 679 448 873 408 1138 1047
ENSG00000000005 0 0 0 0 0 0
ENSG00000000419 467 515 621 365 587 799
ENSG00000000457 260 211 263 164 245 331
ENSG00000000460 60 55 40 35 78 63
ENSG00000000938 0 0 2 0 1 0
exprSet <- assay(airway)

獲取分組信息

group_list <- colData(airway)$dex
group_list
[1] untrt trt untrt trt untrt trt untrt trt
Levels: trt untrt</pre>

使用 DEGList函數(shù)構(gòu)建limma和edgeR包需要的輸入矩陣:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> dge <- edgeR::DGEList(counts=exprSet)

dge
An object of class "DGEList"
$counts
SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517 SRR1039520
ENSG00000000003 679 448 873 408 1138 1047 770
ENSG00000000005 0 0 0 0 0 0 0
ENSG00000000419 467 515 621 365 587 799 417
ENSG00000000457 260 211 263 164 245 331 233
ENSG00000000460 60 55 40 35 78 63 76
SRR1039521
ENSG00000000003 572
ENSG00000000005 0
ENSG00000000419 508
ENSG00000000457 229
ENSG00000000460 60
64097 more rows ...

$samples
group lib.size norm.factors
SRR1039508 1 20637971 1
SRR1039509 1 18809481 1
SRR1039512 1 25348649 1
SRR1039513 1 15163415 1
SRR1039516 1 24448408 1
SRR1039517 1 30818215 1
SRR1039520 1 19126151 1
SRR1039521 1 21164133 1</pre>

可以看到包含了counts矩陣和一些其他用于差異分析要使用的信息,還可以把分組信息添加進(jìn)來。

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> DEG <- edgeR::DGEList(counts=exprSet,group=factor(group_list))

DEG
An object of class "DGEList"
$counts
SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517 SRR1039520
ENSG00000000003 679 448 873 408 1138 1047 770
ENSG00000000005 0 0 0 0 0 0 0
ENSG00000000419 467 515 621 365 587 799 417
ENSG00000000457 260 211 263 164 245 331 233
ENSG00000000460 60 55 40 35 78 63 76
SRR1039521
ENSG00000000003 572
ENSG00000000005 0
ENSG00000000419 508
ENSG00000000457 229
ENSG00000000460 60
64097 more rows ...

$samples
group lib.size norm.factors
SRR1039508 untrt 20637971 1
SRR1039509 trt 18809481 1
SRR1039512 untrt 25348649 1
SRR1039513 trt 15163415 1
SRR1039516 untrt 24448408 1
SRR1039517 trt 30818215 1
SRR1039520 untrt 19126151 1
SRR1039521 trt 21164133 1</pre>

這些使用方法適用于絕大多數(shù)limma和edgeR包差異分析的表達(dá)矩陣構(gòu)建乏盐。

2.limma和edgeR包分組矩陣的設(shè)置

limma和edgeR的假設(shè)都是數(shù)據(jù)符合正態(tài)分布佳窑,構(gòu)建線性模型。 使用model.matrix函數(shù)構(gòu)建分組信息的矩陣父能,就是將分組信息二值化华嘹,用0和1構(gòu)成的矩陣來代表不同的分組信息

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> design <- model.matrix(~0+factor(group_list))

colnames(design) <- levels(factor(group_list))
rownames(design) <- colnames(exprSet)
design
trt untrt
SRR1039508 0 1
SRR1039509 1 0
SRR1039512 0 1
SRR1039513 1 0
SRR1039516 0 1
SRR1039517 1 0
SRR1039520 0 1
SRR1039521 1 0
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"factor(group_list)`
[1] "contr.treatment"</pre>

需要注意的一點(diǎn)是下面兩種構(gòu)建模型矩陣的區(qū)別

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> design <- model.matrix(~factor(group_list))

design
(Intercept) factor(group_list)untrt
1 1 1
2 1 0
3 1 1
4 1 0
5 1 1
6 1 0
7 1 1
8 1 0
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"factor(group_list)`
[1] "contr.treatment"

design <- model.matrix(~0+factor(group_list))
design
factor(group_list)trt factor(group_list)untrt
1 0 1
2 1 0
3 0 1
4 1 0
5 0 1
6 1 0
7 0 1
8 1 0
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"factor(group_list)`
[1] "contr.treatment"</pre>

第一種方法是將第一列的分組信息作為線性模型的截距法竞,第二列開始依次與第一列比較,通過coef參數(shù)可以把差異分析結(jié)果依次提取出來强挫。

第二種方法岔霸,僅僅是分組信息而已,需要通過makeContrasts函數(shù)來制作差異比較矩陣控制俯渤。

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> # 通過makeContrasts設(shè)置需要進(jìn)行對比的分組

comp='trt-untrt'
cont.matrix <- makeContrasts(contrasts=c(comp),levels = design)
cont.matrix
Contrasts
Levels trt-untrt
trt 1
untrt -1</pre>

二呆细、DESeq2包的表達(dá)矩陣和分組信息的構(gòu)建

1.DESeq2包輸入文件:DESeqDataSet對象的制作

構(gòu)建DESeqDataSet函數(shù)的參數(shù)示例:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">DESeqDataSet(se, design, ignoreRank = FALSE)

DESeqDataSetFromMatrix(countData, colData, design, tidy = FALSE,
ignoreRank = FALSE, ...)

DESeqDataSetFromHTSeqCount(sampleTable, directory = ".", design,
ignoreRank = FALSE, ...)

DESeqDataSetFromTximport(txi, colData, design, ...)</pre>

DESeqDataSet使用示例:從SummarizedExperiment流程中產(chǎn)生的數(shù)據(jù)導(dǎo)入

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> library(airway)

data(airway)
ddsSE <- DESeq2::DESeqDataSet(airway, design = ~ cell + dex)
ddsSE
class: DESeqDataSet
dim: 64102 8
metadata(2): '' version
assays(1): counts
rownames(64102): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99
rowData names(0):
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(9): SampleName cell ... Sample BioSample
</pre>

DESeqDataSetFromMatrix使用示例:從count矩陣中構(gòu)建DESeqDataSet:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">> colData <- data.frame(row.names=colnames(exprSet),group_list=group_list)

dds <- DESeq2::DESeqDataSetFromMatrix(countData = exprSet,colData = colData, design = ~ group_list)
dds

class: DESeqDataSet
dim: 64102 8
metadata(1): version
assays(1): counts
rownames(64102): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99
rowData names(0):
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(1): group_list</pre>

DESeqDataSetFromHTSeqCount使用示例:從HTSeqCount中導(dǎo)入數(shù)據(jù)

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 指定表達(dá)矩陣文件所在的文件夾

directory <- system.file("extdata", package="pasilla",

  •                      mustWork=TRUE)
    

directory
[1] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library/pasilla/extdata"
sampleFiles <- grep("treated",list.files(directory),value=TRUE)
sampleFiles
[1] "treated1fb.txt" "treated2fb.txt" "treated3fb.txt" "untreated1fb.txt"
[5] "untreated2fb.txt" "untreated3fb.txt" "untreated4fb.txt"
sampleCondition <- sub("(.treated).","\1",sampleFiles)
sampleCondition
[1] "treated" "treated" "treated" "untreated" "untreated" "untreated" "untreated"

構(gòu)建包含表達(dá)矩陣文件信息的數(shù)據(jù)框

sampleTable <- data.frame(sampleName = sampleFiles,

  •                       fileName = sampleFiles,
    
  •                       condition = sampleCondition)
    

導(dǎo)入為DESeqDataSet對象

ddsHTSeq <- DESeq2::DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,

  •                                    directory = directory,
    
  •                                    design= ~ condition)
    

ddsHTSeq
class: DESeqDataSet
dim: 70463 7
metadata(1): version
assays(1): counts
rownames(70463): FBgn0000003:001 FBgn0000008:001 ... FBgn0261575:001
FBgn0261575:002
rowData names(0):
colnames(7): treated1fb.txt treated2fb.txt ... untreated3fb.txt untreated4fb.txt
colData names(1): condition</pre>

DESeqDataSetFromTximport使用示例:通過Tximport導(dǎo)入不基于比對的基因定量矩陣,主要是以下四個(gè)常用軟件

  • Salmon (Patro et al. 2017)
  • Sailfish (Patro, Mount, and Kingsford 2014)
  • kallisto (Bray et al. 2016)
  • RSEM (Li and Dewey 2011)

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 加載R包及示例數(shù)據(jù)

library("tximport")
library("readr")
library("tximportData")
dir <- system.file("extdata", package="tximportData")

設(shè)置分組信息

group_list <- factor(rep(c("A","B"),each=3))

獲取表達(dá)矩陣所在的文件夾八匠,salmon的結(jié)果為例

files <- list.files(file.path(dir,"salmon"),pattern = "*quant.sf.gz", recursive = TRUE)
full_files_path <- file.path(dir,"salmon",files)

讀入轉(zhuǎn)錄本和基因?qū)?yīng)列表

tx2gene <- read_csv(file.path(dir, "tx2gene.gencode.v27.csv"))
Parsed with column specification:
cols(
TXNAME = col_character(),
GENEID = col_character()
)
head(tx2gene)

A tibble: 6 x 2

TXNAME GENEID
<chr> <chr>
1 ENST00000456328.2 ENSG00000223972.5
2 ENST00000450305.2 ENSG00000223972.5
3 ENST00000473358.1 ENSG00000243485.5
4 ENST00000469289.1 ENSG00000243485.5
5 ENST00000607096.1 ENSG00000284332.1
6 ENST00000606857.1 ENSG00000268020.3

txi倒入需要兩個(gè)參數(shù):表達(dá)矩陣所在路徑和基因轉(zhuǎn)錄本對應(yīng)的列表

txi <- tximport(full_files_path, type="salmon", tx2gene=tx2gene)
reading in files with read_tsv
1 2 3 4 5 6
summarizing abundance
summarizing counts
summarizing length
sampleName<- c("ERR188021", "ERR188088", "ERR188288","ERR188297", "ERR188329", "ERR188356")
colData <- cbind(sampleName, group_list)
ddsTxi <- DESeq2::DESeqDataSetFromTximport(txi,

  •                                        colData = colData,
    
  •                                        design = ~ group_list)
    

using counts and average transcript lengths from tximport

ddsTxi
class: DESeqDataSet
dim: 58288 6
metadata(1): version
assays(2): counts avgTxLength
rownames(58288): ENSG00000000003.14 ENSG00000000005.5 ... ENSG00000284747.1
ENSG00000284748.1
rowData names(0):
colnames: NULL
colData names(2): sampleName group_list</pre>

2.DESeq2分組信息的設(shè)置

DESeq2的差異分析的分組信息設(shè)置比較簡單絮爷,主要通過resuls函數(shù)實(shí)現(xiàn)

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">results(object, contrast, name, lfcThreshold = 0,
altHypothesis = c("greaterAbs", "lessAbs", "greater", "less"),
listValues = c(1, -1), cooksCutoff, independentFiltering = TRUE,
alpha = 0.1, filter, theta, pAdjustMethod = "BH", filterFun,
format = c("DataFrame", "GRanges", "GRangesList"), test,
addMLE = FALSE, tidy = FALSE, parallel = FALSE,
BPPARAM = bpparam(), minmu = 0.5)

resultsNames(object)

removeResults(object)</pre>

使用示例:

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">results(dds, contrast=c("condition","C","B"))</pre>

dds代表DESeq2得到了差異分析的結(jié)果,contrast的輸入一個(gè)長度為3的向量梨树,與上面構(gòu)建DESeqDataSet時(shí)輸入的分組信息對應(yīng)坑夯。

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;"># 如輸入的分組信息是如下的因子向量

group_list
[1] A A A B B B
Levels: A B

提取A和B差異分析結(jié)果的示例如下,A代表對照組抡四,B代表處理組柜蜈,注意先后順序,與edgeR正好相反

results(dds, contrast=c("group_list","B","A"))</pre>

三指巡、總結(jié)

limma淑履,edgeR,DESeq2三大包基本是做轉(zhuǎn)錄組差異分析的金標(biāo)準(zhǔn)藻雪,大多數(shù)轉(zhuǎn)錄組的文章都是用這三個(gè)R包進(jìn)行差異分析秘噪。 edgeR差異分析速度快,得到的基因數(shù)目比較多勉耀,假陽性高(實(shí)際不差異結(jié)果差異)指煎。 DESeq2差異分析速度慢,得到的基因數(shù)目比較少便斥,假陰性高(實(shí)際差異結(jié)果不差異)贯要。 需要注意的是制作分組信息的因子向量是,因子水平的前后順序椭住,在R的很多模型中崇渗,默認(rèn)將因子向量的第一個(gè)水平看作對照組

四、假如是多個(gè)分組呢

比如宅广,大家都知道葫掉,TCGA的乳腺癌可以分成PAM50的5類,那么差異分析就復(fù)雜了跟狱,大家可以拿我3年前的WGCNA的教程做例子俭厚,下面是分組信息啦

image

這個(gè)時(shí)候有兩個(gè)策略來做差異分析,當(dāng)然驶臊,分組比較多的時(shí)候挪挤,差異分析并不是最好的策略啦,WGCNA等其它算法更好关翎!

策略1:在分組信息里面挑選

參考我GitHub代碼扛门, https://github.com/jmzeng1314/my-R/tree/master/10-RNA-seq-3-groups

<pre class="prism-token token language-javascript" style="box-sizing: border-box; list-style: inherit; margin: 24px 0px; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-style: normal; font-variant: normal; font-weight: 400; font-stretch: normal; font-size: 14px; padding: 16px; overflow: auto; line-height: 1.45; background-color: rgb(247, 247, 247); border-radius: 3px; word-wrap: normal; text-align: left; white-space: pre; word-spacing: 0px; word-break: normal; tab-size: 2; hyphens: none; color: rgb(51, 51, 51); letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">group_list
cont.matrix=makeContrasts(contrasts=c('treat_12-control','treat_2-control'),levels = design)
fit2=contrasts.fit(fit,cont.matrix)
fit2=eBayes(fit2)

tempOutput = topTable(fit2, coef='treat_12-control', n=Inf)
DEG_treat_12_limma_voom = na.omit(tempOutput)
write.csv(DEG_treat_12_limma_voom,"DEG_treat_12_limma_voom.csv",quote = F)

tempOutput = topTable(fit2, coef='treat_2-control', n=Inf)
DEG_treat_2_limma_voom = na.omit(tempOutput)
write.csv(DEG_treat_2_limma_voom,"DEG_treat_2_limma_voom.csv",quote = F)</pre>

在提取差異分析結(jié)果的時(shí)候,需要指定是哪個(gè)組和哪個(gè)組在進(jìn)行比較。

值得一提的是, 我的GitHub簡直就是寶藏杉编,我上面提到的3年前的WGCNA的教程做例子又谋,最近看到有兩個(gè)文章就拿同樣的數(shù)據(jù)代碼和圖片發(fā)了一個(gè)4分,一個(gè)5分的文章!!!

你懂得火焰!

策略2:提取子矩陣和子分組信息

這個(gè)很容易理解了,把表達(dá)矩陣根據(jù)自己想要進(jìn)行的兩兩比對來篩選即可胧沫,這樣就可以多次做差異分析啦荐健,而且保證每次都只有兩個(gè)分組。

參考資料

http://www.bio-info-trainee.com/1514.html http://www.bio-info-trainee.com/255.html http://www.bio-info-trainee.com/1533.html https://ucdavis-bioinformatics-training.github.io/2018-June-RNA-Seq-Workshop/thursday/DE.html http://www.bioconductor.org/packages/release/bioc/html/limma.html http://www.bioconductor.org/packages/release/bioc/html/edgeR.html http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html https://rdrr.io/bioc/DESeq2/man/DESeqDataSet.html https://rdrr.io/bioc/DESeq2/man/results.html https://rdrr.io/bioc/limma/man/11RNAseq.html

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末琳袄,一起剝皮案震驚了整個(gè)濱河市江场,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌窖逗,老刑警劉巖址否,帶你破解...
    沈念sama閱讀 219,427評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異碎紊,居然都是意外死亡佑附,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,551評論 3 395
  • 文/潘曉璐 我一進(jìn)店門仗考,熙熙樓的掌柜王于貴愁眉苦臉地迎上來音同,“玉大人,你說我怎么就攤上這事秃嗜∪ň” “怎么了顿膨?”我有些...
    開封第一講書人閱讀 165,747評論 0 356
  • 文/不壞的土叔 我叫張陵,是天一觀的道長叽赊。 經(jīng)常有香客問我恋沃,道長,這世上最難降的妖魔是什么必指? 我笑而不...
    開封第一講書人閱讀 58,939評論 1 295
  • 正文 為了忘掉前任囊咏,我火速辦了婚禮,結(jié)果婚禮上塔橡,老公的妹妹穿的比我還像新娘梅割。我一直安慰自己,他們只是感情好葛家,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,955評論 6 392
  • 文/花漫 我一把揭開白布户辞。 她就那樣靜靜地躺著,像睡著了一般惦银。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上末誓,一...
    開封第一講書人閱讀 51,737評論 1 305
  • 那天扯俱,我揣著相機(jī)與錄音,去河邊找鬼喇澡。 笑死迅栅,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的晴玖。 我是一名探鬼主播读存,決...
    沈念sama閱讀 40,448評論 3 420
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼呕屎!你這毒婦竟也來了让簿?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,352評論 0 276
  • 序言:老撾萬榮一對情侶失蹤秀睛,失蹤者是張志新(化名)和其女友劉穎尔当,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體蹂安,經(jīng)...
    沈念sama閱讀 45,834評論 1 317
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡椭迎,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,992評論 3 338
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了田盈。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片畜号。...
    茶點(diǎn)故事閱讀 40,133評論 1 351
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖允瞧,靈堂內(nèi)的尸體忽然破棺而出简软,到底是詐尸還是另有隱情蛮拔,我是刑警寧澤,帶...
    沈念sama閱讀 35,815評論 5 346
  • 正文 年R本政府宣布替饿,位于F島的核電站语泽,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏视卢。R本人自食惡果不足惜踱卵,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,477評論 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望据过。 院中可真熱鬧惋砂,春花似錦、人聲如沸绳锅。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,022評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背敬鬓。 一陣腳步聲響...
    開封第一講書人閱讀 33,147評論 1 272
  • 我被黑心中介騙來泰國打工杈抢, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留诊杆,地道東北人贷盲。 一個(gè)月前我還...
    沈念sama閱讀 48,398評論 3 373
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親宁脊。 傳聞我的和親對象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,077評論 2 355

推薦閱讀更多精彩內(nèi)容