原貼地址:http://www.bioinfo-scrounger.com/archives/639
Pathview是一個用于整合表達(dá)譜數(shù)據(jù)并用于可視化kegg通路的一個R包高镐,其會先下載kegg官網(wǎng)上的通路圖采够,然后整合輸入數(shù)據(jù)對通路圖進行再次渲染(加工?)跛梗,從而對kegg通路圖進行一定程度上的個性化處理
Pathview是一個bioconductor包吆寨,正常安裝即可
? ? source("https://bioconductor.org/biocLite.R")? ? biocLite("pathview")
以其自帶測試數(shù)據(jù)集启妹,對Pathview包的可視化功能做個整理靶擦,用法很簡單再膳,主要為圍繞著pathview這個作圖函數(shù)(參數(shù)很多挺勿,用?pathview查看下各個參數(shù)的說明),只是需要將數(shù)據(jù)整理或者整合成其需要的輸入數(shù)據(jù)格式即可
先看下demo數(shù)據(jù)是什么格式喂柒,列名是每個樣本名不瓶,行名是每個gene的entrez id
> head(gse16873.d)? ? ? ? ? ? ? DCIS_1? ? ? DCIS_2? ? ? DCIS_3? ? ? DCIS_4? ? ? DCIS_5? ? ? DCIS_610000? ? -0.30764480 -0.14722769 -0.023784808 -0.07056193 -0.001323087 -0.1502681310001? ? ? 0.41586805 -0.33477259 -0.513136907 -0.16653712? 0.111122223? 0.1340073410002? ? ? 0.19854925? 0.03789588? 0.341865341 -0.08527420? 0.767559264? 0.1582860910003? ? -0.23155297 -0.09659311 -0.104727283 -0.04801404 -0.208056443? 0.03344448100048912 -0.04490724 -0.05203146? 0.036390376? 0.04807823? 0.027205816? 0.0544473910004? ? -0.08756237 -0.05027725? 0.001821133? 0.03023835? 0.008034394 -0.06860749
先看下Pathview最常見的一種用法:將某個樣本的表達(dá)量(測試數(shù)據(jù)已經(jīng)是歸一化后的表達(dá)量);其實也可以任何列數(shù)據(jù)灾杰,不僅僅是表達(dá)量數(shù)據(jù)蚊丐,也可以是foldchange等數(shù)據(jù),人為特定的數(shù)值型數(shù)據(jù)也行艳吠;最后以color
bar的形式在kegg通路圖上的對應(yīng)節(jié)點展示麦备;如下例子所示,取第一個樣本的數(shù)值最為gene.data昭娩,通路選擇04110凛篙,物種為hsa
p <- pathview(gene.data = gse16873.d[, 1], pathway.id = "04110", species = "hsa", out.suffix = "gse16873")
圖中右上角color bar是可以通過參數(shù)limit來調(diào)整的,默認(rèn)是limit=list(gene=1, cpd=1)栏渺,即gene.data的限定范圍在(-1,1)呛梆,cpd.data的限定范圍在c(-1,1);如果有其他需要可以設(shè)置為limit=list(gene=2, cpd=1)等磕诊;如果想最大值和最小值不對稱則limit = list(gene=c(-1,2))即可
如果想將color bar分割的更加密集一些填物,則可以修改默認(rèn)參數(shù)bins = list(gene = 10, cpd = 10)為20。霎终。滞磺。只能說Pathview作者好細(xì)心。神僵。
接下來看看Pathview如何展示整合數(shù)據(jù)的雁刷,如基因和代謝組的數(shù)據(jù)在kegg通路上整合可視化
先載入代謝物相關(guān)數(shù)據(jù),如
sim.cpd.data=sim.mol.data(mol.type="cpd", nmol=3000)> head(sim.cpd.data)? ? C02787? ? ? C08521? ? ? C01043? ? ? C11496? ? ? C07111? ? ? C00031 -1.15259948? 0.46416071? 0.72893247? 0.41061745 -1.46114720 -0.01890809
然后通過cpd.data參數(shù)將代謝物數(shù)據(jù)加入到pathview函數(shù)中保礼,如:
p <- pathview(gene.data = gse16873.d[, 1], cpd.data = sim.cpd.data,
? ? ? ? ? ? ? pathway.id = "00640", species = "hsa", out.suffix = "gse16873.cpd",
? ? ? ? ? ? ? keys.align = "y", kegg.native = T, key.pos = "topright")
keys.align = "y"控制color bar對齊方式沛励,key.pos = "topright"則是對應(yīng)color bar在圖上的位置(主要有bottomleft责语,bottomright,topleft以及topright)
Pathview不僅能整合不同組學(xué)數(shù)據(jù)(上述例子是轉(zhuǎn)錄組和代謝組)目派,還可以整合多個樣本的數(shù)據(jù)坤候,比如我們在上面看到gene.data只導(dǎo)入了一列數(shù)據(jù),其實Pathview支持多列(也就是多個樣本)數(shù)據(jù)的導(dǎo)入企蹭,cpd.data也一樣
如按照說明文檔中的模擬一組多樣本的代謝物表達(dá)譜數(shù)據(jù)(也似乎是歸一化后的)
sim.cpd.data2 = matrix(sample(sim.cpd.data, 18000, replace = T), ncol = 6)rownames(sim.cpd.data2) = names(sim.cpd.data)colnames(sim.cpd.data2) = paste("exp", 1:6, sep = "")> head(sim.cpd.data2, 3)? ? ? ? ? ? exp1? ? ? exp2? ? ? exp3? ? ? exp4? ? ? exp5? ? ? exp6
C02787 -0.5425224 1.7940544 -0.2629972? 0.2729004 -0.4897083 1.05131740C08521 -1.1903358 0.4448658? 2.6074747 -0.9163451? 0.1239377 0.57827710C01043? 0.3391817 1.6855815? 1.0203767 -1.3184792? 0.4727454 0.03381888
然后還是使用pathview函數(shù)作圖白筹,相比上面例子增加了參數(shù)match.data = F,主要用于表示gene和cpd樣本數(shù)是否要匹配谅摄;multi.state = T則表示多個樣本在同一個圖上顯示
p <- pathview(gene.data = gse16873.d[, 1:3], cpd.data = sim.cpd.data2[, 1:2],
? ? ? ? ? ? ? pathway.id = "00640", species = "hsa", out.suffix = "gse16873.cpd.3-2s",
? ? ? ? ? ? ? keys.align = "y", kegg.native = T, match.data = F, multi.state = T, same.layer = T)
正如之前說的徒河,Pathview不僅能展示連續(xù)性數(shù)據(jù),還可以展示離散型數(shù)據(jù)(如:差異顯著OR不顯著送漠,上調(diào)OR下調(diào)等)顽照,我先模擬一個(0,1)離散型數(shù)據(jù)
testdata2 <- data.frame(sample1=sample(c(0,1), 10000, replace = T), stringsAsFactors = F)row.names(testdata2) <- row.names(gse16873.d)[1:10000]
然后在pathview函數(shù)中將limit = list(gene = c(0,1))和bins = list(gene = 1)設(shè)置為一致;由于默認(rèn)顏色是低(綠色)闽寡,中(灰色)代兵,高(紅色),而我們測試數(shù)據(jù)是0爷狈,1兩個數(shù)字植影,所以需要將mid設(shè)置為紅色mid = list(gene = "red"),這樣0就是綠色涎永,1就是紅色了
p <- pathview(gene.data = testdata, pathway.id = "00640", species = "hsa",
? ? ? ? ? ? ? out.suffix = "test.genes",discrete = list(gene = T), kegg.native = T,
? ? ? ? ? ? ? limit = list(gene = c(0,1)), bins = list(gene = 1), mid = list(gene = "red"))
從我們之前的測試數(shù)據(jù)中可看到思币,pathview函數(shù)默認(rèn)使用的Id是entrez id,從默認(rèn)參數(shù)gene.idtype="entrez"可得土辩;對于絕大多數(shù)真核生物來說支救,其entrez id確實等于其對應(yīng)的kegg id,但是有些物種還是特例的拷淘,比如擬南芥各墨。。启涯。當(dāng)然Pathview包給出了一些id轉(zhuǎn)化方法(但我一般不用贬堵,因為我比較喜歡自己來轉(zhuǎn)化);因此一般我們能拿到的就已經(jīng)是kegg id了结洼,比如擬南芥的ath:AT4G17260黎做,這是需要設(shè)置gene.idtype=kegg,這里需要注意的松忍,gene.data參數(shù)輸入的不是ath:AT4G17260而是AT4G17260蒸殿,下面以一個例子說明
先模擬一個數(shù)據(jù)
testid <- data.frame(expr = rnorm(5), stringsAsFactors = F)row.names(testid) <- c("AT4G17260", "AT2G14170", "AT2G20420", "AT1G06550", "AT1G36160")
然后還是以通路00640為例,物種是擬南芥
p <- pathview(gene.data = testid, pathway.id = "00640", species = "ath",
? ? ? ? ? ? ? gene.idtype = "kegg", out.suffix = "testid")
但是當(dāng)物種是kegg未收錄的物種,這是可能用的不再是kegg id還是K號宏所,如K00016酥艳,這時的通路則是KEGG ortholog pathways,物種設(shè)定為ko爬骤,即:species = "ko"
testko <- data.frame(expr = rnorm(5), stringsAsFactors = F)row.names(testko) <- c("K00016", "K01026", "K00140", "K18371", "K17741")