1.背景知識(shí)
眾所周知眉菱,limma可以做基因表達(dá)芯片和轉(zhuǎn)錄組數(shù)據(jù)的差異分析彻坛,可以方便的得處兩組之間有表達(dá)量差別的基因钞速。多分組差異分析其實(shí)也是兩兩差異分析的批量做法旁仿。
有的實(shí)驗(yàn)設(shè)計(jì)相對(duì)復(fù)雜一點(diǎn)藕夫,比如今天使用的例子:
實(shí)驗(yàn)設(shè)計(jì)包括了兩種分組,一個(gè)是siC和sip53枯冈,一個(gè)是NT與TNF毅贮。
那么常規(guī)的兩組之間的差異分析也就不能同時(shí)分析這兩種實(shí)驗(yàn)設(shè)計(jì)啦!
這時(shí)候我們就需要用到雙因素差異分析尘奏。
1.整理表達(dá)矩陣
rm(list = ls())
library(tinyarray)
library(tidyverse)
library(limma)
gse = "GSE53153"
a = geo_download(gse,destdir=tempdir(),colon_remove = T)
find_anno(a$gpl)
library(illuminaHumanv4.db);ids <- toTable(illuminaHumanv4SYMBOL)
eset = trans_array(log2(a$exp+1),ids)
eset[1:4,1:4]
## GSM1283060 GSM1283061 GSM1283062 GSM1283063
## EEF1A1 16.125688 16.097172 16.035905 15.996306
## GAPDH 15.125034 15.114759 15.188604 15.102693
## SLC35E2A 7.980711 8.234099 7.651769 8.134426
## CLXN 7.634448 7.811856 7.692790 7.792465
2.整理實(shí)驗(yàn)設(shè)計(jì)
#雙因素分析
targets = data.frame(mutate = rep(c("siC","sip53"),each = 6),
induce = rep(c("untreat", "treat"), each = 3, times = 2))
targets
## mutate induce
## 1 siC untreat
## 2 siC untreat
## 3 siC untreat
## 4 siC treat
## 5 siC treat
## 6 siC treat
## 7 sip53 untreat
## 8 sip53 untreat
## 9 sip53 untreat
## 10 sip53 treat
## 11 sip53 treat
## 12 sip53 treat
把這兩種分組合并到一起
TS <- paste(targets$mutate, targets$induce, sep=".")
TS <- factor(TS, levels=c("siC.untreat","siC.treat","sip53.untreat","sip53.treat"))
TS
## [1] siC.untreat siC.untreat siC.untreat siC.treat siC.treat
## [6] siC.treat sip53.untreat sip53.untreat sip53.untreat sip53.treat
## [11] sip53.treat sip53.treat
## Levels: siC.untreat siC.treat sip53.untreat sip53.treat
3.做差異分析
這里同時(shí)做了三次差異分析:
哪些基因?qū)iC組的treat有反應(yīng),即siC組的treat-untreat
哪些基因?qū)ip53組的treat有反應(yīng),即sip53組的treat-untreat
與siC組相比汽烦,哪些基因在sip53組中的反應(yīng)不同泻轰,即上面兩組差異分析所得的logFC再比較!
前兩次差異分析其實(shí)相當(dāng)于取子集+二分組差異分析俗孝。第三次是雙因素差異分析咯酒甸。
design <- model.matrix(~0+TS)
colnames(design) <- levels(TS)
fit <- lmFit(eset, design)
cont.matrix <- makeContrasts(
w = siC.treat-siC.untreat,
m = sip53.treat-sip53.untreat,
diff = (sip53.treat-sip53.untreat)-(siC.treat-siC.untreat),
levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
deg1 = topTable(fit2,number = Inf,coef = 1)
deg2 = topTable(fit2,number = Inf,coef = 2)
deg3 = topTable(fit2,number = Inf,coef = 3)
head(deg3)
## logFC AveExpr t P.Value adj.P.Val B
## MMP9 -1.889600 8.014973 -17.520398 8.191150e-12 1.712278e-07 13.259648
## CXCL10 -1.598374 9.294062 -12.550379 1.166271e-09 1.218986e-05 10.453810
## PI3 1.458585 8.562082 10.029716 2.812942e-08 1.960058e-04 8.248613
## EBI3 -1.382219 8.596838 -9.797690 3.885369e-08 2.030494e-04 8.008927
## LTB -1.412194 8.331969 -8.950493 1.327325e-07 5.549282e-04 7.072795
## TNIP1 -1.048539 12.778876 -8.828485 1.594793e-07 5.556259e-04 6.929717
4.差異基因
get_diff_gene = function(deg,logFC_t = 1,p_t = 0.05){
k1 = (deg$P.Value < p_t)&(deg$logFC < -logFC_t)
k2 = (deg$P.Value < p_t)&(deg$logFC > logFC_t)
library(dplyr)
deg = mutate(deg,change = ifelse(k1,"down",ifelse(k2,"up","stable")))
table(deg$change)
return(rownames(deg)[deg$change!="stable"])
}
cg1 = get_diff_gene(deg1);length(cg1)
## [1] 121
cg2 = get_diff_gene(deg2);length(cg2)
## [1] 109
cg3 = get_diff_gene(deg3);length(cg3)
## [1] 14
dat = list(siC = cg1,
sip53 = cg2,
diff = cg3)
draw_venn(dat,"")
ggsave("v.png")
dat = data.frame(t(eset[cg3,]),
targets,
TS,check.names = F)
library(ggplot2)
ps = lapply(cg3, function(g){
ggplot(dat,aes_string(x = "TS",
y = paste0("`",g,"`"),
fill = "induce"))+
geom_boxplot()+
theme_bw()+
theme(legend.position = "top")
})
library(patchwork)
wrap_plots(ps[1:4],ncol = 2)
5.logFC 是如何計(jì)算的
TS
## [1] siC.untreat siC.untreat siC.untreat siC.treat siC.treat
## [6] siC.treat sip53.untreat sip53.untreat sip53.untreat sip53.treat
## [11] sip53.treat sip53.treat
## Levels: siC.untreat siC.treat sip53.untreat sip53.treat
x = eset["MMP9",]
(mean(x[10:12])-mean(x[7:9]))-(mean(x[4:6])-mean(x[1:3]))
## [1] -1.8896
deg3$logFC[1]
## [1] -1.8896
所以也就是計(jì)算出了兩組之間的logFC之差。
代碼參考自limma幫助文檔第九章赋铝。