寫在前面
不知道各位最近過得怎么樣宪萄,昨天去修了腳??艺谆,感覺自己馬上就要邁入油膩中年人的行列了。??
不過說實話拜英,還是挺舒服的静汤,值得再去一次。??
接著更一下WGCNA
的教程吧居凶,還是值得大家好好學習一下的虫给。??
今天介紹的是分步法創(chuàng)建網絡與識別模塊,這種比較適合有個性化需求的人使用侠碧,如果你是懶癌晚期
抹估,可以直接看上一篇教程。??
用到的包
rm(list = ls())
library(tidyverse)
library(WGCNA)
示例數據
我們還是和之前一樣弄兜,把之前清洗好的輸入數據拿出來吧药蜻。??
load("./Consensus-dataInput.RData")
提取數據集個數
首先和之前一樣提取一下我們的數據集個數,后面會用到替饿。??
nSets <- checkSets(multiExpr)$nSets
挑選軟閾值并可視化
5.1 創(chuàng)建power
powers <- c(seq(4,10,by=1), seq(12,20, by=2))
5.2 計算power
和之前一樣语泽,我們?yōu)槊總€數據集計算一下power
,挑選軟閾值
视卢。??
powerTables = vector(mode = "list", length = nSets)
for (set in 1:nSets){
powerTables[[set]] = list(data = pickSoftThreshold(multiExpr[[set]]$data, powerVector=powers,
verbose = 2)[[2]])
collectGarbage()
}
5.3 可視化相關參數設置
設置一下繪圖的參數踱卵,方便可視化。??
colors = c("black", "red")
plotCols = c(2,5,6,7)
colNames = c("Scale Free Topology Model Fit", "Mean connectivity", "Median connectivity",
"Max connectivity")
ylim = matrix(NA, nrow = 2, ncol = 4);
for (set in 1:nSets)
{
for (col in 1:length(plotCols))
{
ylim[1, col] = min(ylim[1, col], powerTables[[set]]$data[, plotCols[col]], na.rm = TRUE);
ylim[2, col] = max(ylim[2, col], powerTables[[set]]$data[, plotCols[col]], na.rm = TRUE);
}
}
5.4 可視化一下吧
來繪一下圖吧据过。??
sizeGrWindow(8, 6)
par(mfcol = c(2,2));
par(mar = c(4.2, 4.2 , 2.2, 0.5))
cex1 = 0.7;
for (col in 1:length(plotCols)) for (set in 1:nSets)
{
if (set==1)
{
plot(powerTables[[set]]$data[,1], -sign(powerTables[[set]]$data[,3])*powerTables[[set]]$data[,2],
xlab="Soft Threshold (power)",ylab=colNames[col],type="n", ylim = ylim[, col],
main = colNames[col]);
addGrid();
}
if (col==1)
{
text(powerTables[[set]]$data[,1], -sign(powerTables[[set]]$data[,3])*powerTables[[set]]$data[,2],
labels=powers,cex=cex1,col=colors[set]);
} else
text(powerTables[[set]]$data[,1], powerTables[[set]]$data[,plotCols[col]],
labels=powers,cex=cex1,col=colors[set]);
if (col==1)
{
legend("bottomright", legend = setLabels, col = colors, pch = 20) ;
} else
legend("topright", legend = setLabels, col = colors, pch = 20) ;
}
網絡鄰接的計算
網絡構建首先需要使用軟閾值計算各個數據集中的鄰接關系惋砂,前面計算的power
是6
哦妒挎。??
softPower <- 6
adjacencies <- array(0, dim = c(nSets, nGenes, nGenes))
for (set in 1:nSets){
adjacencies[set, , ] <- abs(cor(multiExpr[[set]]$data, use = "p"))^softPower
}
拓撲重疊的計算
接著我們把鄰接矩陣
轉換成拓撲重疊矩陣
,也就是計算一下我們常說的TOM
班利。??
TOM <- array(0, dim = c(nSets, nGenes, nGenes))
for (set in 1:nSets){
TOM[set, , ] <- TOMsimilarity(adjacencies[set, , ])
}
給TOM做個scale
8.1 scale一下
不同數據集的TOM
可能具有不同的統(tǒng)計特性饥漫,可能會導致偏差。??
為了讓兩個數據集有可比性罗标,我們需要做一下scale
庸队。??
scaleP = 0.95
set.seed(12345)
nSamples = as.integer(1/(1-scaleP) * 1000)
scaleSample = sample(nGenes*(nGenes-1)/2, size = nSamples)
TOMScalingSamples = list()
scaleQuant = rep(1, nSets)
scalePowers = rep(1, nSets)
for (set in 1:nSets){
TOMScalingSamples[[set]] = as.dist(TOM[set, , ])[scaleSample]
scaleQuant[set] = quantile(TOMScalingSamples[[set]],
probs = scaleP, type = 8)
if (set>1){
scalePowers[set] = log(scaleQuant[1])/log(scaleQuant[set])
TOM[set, ,] = TOM[set, ,]^scalePowers[set]
}
}
8.2 可視化scale效果
我們做個Q-Q plot
來看下scale的效果吧。??
做完以后離參考線更近了闯割,當然這種變化比較小就是了彻消。??
scaledTOMSamples = list()
for (set in 1:nSets){
scaledTOMSamples[[set]] = TOMScalingSamples[[set]]^scalePowers[set]
}
sizeGrWindow(6,6)
qqUnscaled = qqplot(TOMScalingSamples[[1]], TOMScalingSamples[[2]], plot.it = T, cex = 0.6,
xlab = paste("TOM in", setLabels[1]), ylab = paste("TOM in", setLabels[2]),
main = "Q-Q plot of TOM", pch = 20)
qqScaled = qqplot(scaledTOMSamples[[1]], scaledTOMSamples[[2]], plot.it = FALSE)
points(qqScaled$x, qqScaled$y, col = "red", cex = 0.6, pch = 20);
abline(a=0, b=1, col = "blue")
legend("topleft", legend = c("Unscaled TOM", "Scaled TOM"), pch = 20, col = c("black", "red"))
共識拓撲重疊的計算
計算完成后,假設我們有一個基因在兩個數據集中表達宙拉,如何認為它是差異基因呢 ??
根據WGCNA
中的解釋宾尚,如果這個基因在兩個數據集中差異都很大,這個時候我們才認為它是一個變化差異大的基因
谢澈,就像達成共識一樣煌贴。??
consensusTOM = pmin(TOM[1, , ], TOM[2, , ])
聚類與模塊識別
10.1 計算模塊
consTree <- hclust(as.dist(1-consensusTOM), method = "average")
minModuleSize <- 30
unmergedLabels <- cutreeDynamic(dendro = consTree, distM = 1-consensusTOM,
deepSplit = 2, cutHeight = 0.995,
minClusterSize = minModuleSize,
pamRespectsDendro = F)
unmergedColors <- labels2colors(unmergedLabels)
table(unmergedLabels)
10.2 可視化
sizeGrWindow(8,6)
plotDendroAndColors(consTree, unmergedColors, "Dynamic Tree Cut",
dendroLabels = F, hang = 0.03,
addGuide = T, guideHang = 0.05)
10.3 計算模塊MEs
可以用于后面合并相似模塊哦。??
unmergedMEs <- multiSetMEs(multiExpr, colors = NULL, universalColors = unmergedColors)
consMEDiss <- consensusMEDissimilarity(unmergedMEs)
consMETree <- hclust(as.dist(consMEDiss), method = "average")
10.4 可視化聚類結果
sizeGrWindow(7,6)
par(mfrow = c(1,1))
plot(consMETree, main = "Consensus clustering of consensus module eigengenes",
xlab = "", sub = "")
abline(h=0.25, col = "red")
10.5 合并
merge <- mergeCloseModules(multiExpr, unmergedLabels, cutHeight = 0.25, verbose = 3)
10.6 提取重要數據
moduleLabels <- merge$colors
moduleColors <- labels2colors(moduleLabels)
consMEs <- merge$newMEs
10.7 最終可視化
sizeGrWindow(9,6)
plotDendroAndColors(consTree, cbind(unmergedColors, moduleColors),
c("Unmerged", "Merged"),
dendroLabels = F, hang = 0.03,
addGuide = T, guideHang = 0.05)
Save一下
老規(guī)矩锥忿,保存一下我們辛辛苦苦運行的數據吧牛郑。??
save(consMEs, moduleColors, moduleLabels, consTree, file = "./Consensus-NetworkConstruction-man.RData")
<center>最后祝大家早日不卷!~</center>
點個在看吧各位~ ?.???? ??? ?
<center> <b>?? 往期精彩 <b> </center>
?? <font size=1>?? WGCNA | 值得你深入學習的生信分析方法!~</font>
?? <font size=1>?? ComplexHeatmap | 顏狗寫的高顏值熱圖代碼敬鬓!</font>
?? <font size=1>?? ComplexHeatmap | 你的熱圖注釋還擠在一起看不清嗎Q团蟆?</font>
?? <font size=1>?? Google | 谷歌翻譯崩了我們怎么辦6ご稹础芍?(附完美解決方案)</font>
?? <font size=1>?? scRNA-seq | 吐血整理的單細胞入門教程</font>
?? <font size=1>?? NetworkD3 | 讓我們一起畫個動態(tài)的桑基圖吧~</font>
?? <font size=1>?? RColorBrewer | 再多的配色也能輕松搞定数尿!~</font>
?? <font size=1>?? rms | 批量完成你的線性回歸</font>
?? <font size=1>?? CMplot | 完美復刻Nature上的曼哈頓圖</font>
?? <font size=1>?? Network | 高顏值動態(tài)網絡可視化工具</font>
?? <font size=1>?? boxjitter | 完美復刻Nature上的高顏值統(tǒng)計圖</font>
?? <font size=1>?? linkET | 完美解決ggcor安裝失敗方案(附教程)</font>
?? <font size=1>......</font>
本文由mdnice多平臺發(fā)布