寫在前面
我8月26號踏著歡快的腳步回家度假,原計劃9月5號回珠海洲拇,結(jié)果返程飛機被連環(huán)取消了三次奈揍,我滯留在家5天了,已經(jīng)搭好戲臺子在家講課赋续,打算等沒課的周天再走男翰。。纽乱。
1.從一個表達矩陣開始
相關性網(wǎng)絡圖一般都是經(jīng)過若干步驟選出來幾個感興趣的基因展示一下蛾绎,下面的這個例子用我的小R包tinyarray獲取表達矩陣和做探針注釋、差異分析了迫淹,相當于對芯片常規(guī)分析的一個簡化操作秘通。
#devtools::install_github("xjsun1221/tinyarray")
library(tinyarray)
gse = "GSE42872"
geo = geo_download(gse,destdir=tempdir(),by_annopbrobe = FALSE)
Group = rep(c("control","treat"),each = 3)
Group = factor(Group)
find_anno(geo$gpl)
## [1] "`library(hugene10sttranscriptcluster.db);ids <- toTable(hugene10sttranscriptclusterSYMBOL)` and `ids <- AnnoProbe::idmap('GPL6244')` are both avaliable"
ids <- AnnoProbe::idmap(geo$gpl,destdir = tempdir())
deg = get_deg(geo$exp,Group,ids)
這個deg就是經(jīng)過整理的差異分析結(jié)果表格。
head(deg)
## logFC AveExpr t P.Value adj.P.Val B probe_id
## 1 5.780170 7.370282 82.94833 3.495205e-12 1.163798e-07 16.32898 8133876
## 2 -4.212683 9.106625 -68.40113 1.437468e-11 2.393169e-07 15.71739 7965335
## 3 5.633027 8.763220 57.61985 5.053466e-11 4.431880e-07 15.04752 7972259
## 4 -3.801663 9.726468 -57.21112 5.324059e-11 4.431880e-07 15.01709 7972217
## 5 3.263063 10.171635 50.51733 1.324638e-10 8.821294e-07 14.45166 8129573
## 6 -3.843247 9.667077 -45.87910 2.681063e-10 1.487856e-06 13.97123 8015806
## symbol change ENTREZID
## 1 CD36 up 948
## 2 DUSP6 down 1848
## 3 DCT up 1638
## 4 SPRY2 down 10253
## 5 MOXD1 up 26002
## 6 ETV4 down 2118
把表達矩陣轉(zhuǎn)換一下敛熬,并選了top10差異基因用于做后續(xù)的網(wǎng)絡圖
exp = trans_array(geo$exp,ids)
cg = deg$symbol[deg$change!="stable"]
set.seed(10086)
exp = exp[head(cg,10),]
exp[1:4,1:4]
## GSM1052615 GSM1052616 GSM1052617 GSM1052618
## CD36 4.54610 4.40210 4.49239 10.25060
## DUSP6 11.25310 11.20760 11.17820 6.98791
## DCT 5.81479 5.91209 6.11324 11.54910
## SPRY2 11.64830 11.63730 11.59630 7.90508
2.計算相關性
根據(jù)相關系數(shù)和p值做了個分組肺稀,用于后面的配色了,我用的是0.3应民,這個閾值可以調(diào)整话原。
library(corrplot)
M = cor(t(exp))
testRes = cor.mtest(t(exp), conf.level = 0.95)$p
library(tidyverse)
g = pivot_longer(rownames_to_column(as.data.frame(M),var = "from"),
cols = 2:(ncol(M)+1),
names_to = "to",
values_to = "cor")
gp = pivot_longer(rownames_to_column(as.data.frame(testRes)),
cols = 2:(ncol(M)+1),
names_to = "gene",
values_to = "p")
g$p = gp$p
g = g[g$from!=g$to,]
g$group = case_when(g$cor>0.3 & g$p<0.05 ~ "positive",
g$cor< -0.3 & g$p<0.05 ~ "negative",
T~"not" )
head(g)
## # A tibble: 6 x 5
## from to cor p group
## <chr> <chr> <dbl> <dbl> <chr>
## 1 CD36 DUSP6 -1.00 0.0000000933 negative
## 2 CD36 DCT 0.999 0.00000196 positive
## 3 CD36 SPRY2 -1.00 0.000000226 negative
## 4 CD36 MOXD1 1.00 0.0000000840 positive
## 5 CD36 ETV4 -0.999 0.00000208 negative
## 6 CD36 DTL -0.999 0.00000168 negative
上面這個g就是網(wǎng)絡圖的輸入數(shù)據(jù)了
3.畫圖
正相關和負相關分別用紅色和藍色表示咯。
library(igraph)
network = graph_from_data_frame(d=g[g$group!="not",c(1,2,3,5)], directed=F)
my_color = c("#2874C5","#f87669")[as.numeric(as.factor(E(network)$group))]
par(bg="white", mar=c(0,0,0,0))
plot(network,
vertex.size=20,
layout=layout.circle,
vertex.label.cex=0.7,
vertex.frame.color="transparent",
edge.width=abs(E(network)$cor)*3,
edge.color=my_color,edge.curved = 0.2)
4.專用的相關性網(wǎng)絡圖R包
發(fā)現(xiàn)一個哭笑不得的現(xiàn)象诲锹,如果這些基因相關性太強了繁仁,這個專用的R包畫圖還疊到一起了,不是個例归园,我換電腦試過了哈哈黄虱。
library(tidyverse)
t(exp) %>% corrr::correlate() %>%
corrr::network_plot(colours = c("#2874C5", "white", "#f87669"),repel = F,min_cor = .5)
反而是他們的相關性不特別強時,才比較好看庸诱,錯落有致了咧捻浦。
exp = trans_array(geo$exp,ids)
set.seed(100)
cg = sample(1:nrow(exp),10)
exp = exp[cg,]
t(exp) %>% corrr::correlate() %>%
corrr::network_plot(colours = c("#2874C5", "white", "#f87669"),repel = F,min_cor = .5)
奇妙的體驗晤揣。