需求
1.快捷查找ID對(duì)應(yīng)的description,知道通路對(duì)應(yīng)的編號(hào)是多少罩抗。
2.找出某一個(gè)/幾個(gè)通路里的全部基因滨嘱,用來做單獨(dú)的下游分析峰鄙。
如果是要做KEGG的富集分析,clusterProfiler可以搞定:https://bioconductor.org/packages/release/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html
想看kegg通路圖的話太雨,用R包pathview來看先馆,看函數(shù)的幫助文檔就行。
1.找通路ID與description的對(duì)應(yīng)關(guān)系
1.1網(wǎng)站搜索
不批量找的話躺彬,直接網(wǎng)站搜最簡單 https://www.genome.jp/kegg/kegg2.html
1.2.借助msigdbr
需要找全部的對(duì)應(yīng)關(guān)系煤墙,基于前面講的msigdbr可以完成:http://www.reibang.com/p/0098baf2df46
msigdb里面本來就包括了kegg,而且挺齊全的,ID,description,基因梅惯,全都有啦。
library(msigdbr)
KEGG_df = msigdbr(species = "Homo sapiens",category = "C2",subcategory = "CP:KEGG") %>%
dplyr::select(gs_exact_source,gene_symbol,gs_description)
head(KEGG_df)
## # A tibble: 6 x 3
## gs_exact_source gene_symbol gs_description
## <chr> <chr> <chr>
## 1 hsa02010 ABCA1 ABC transporters
## 2 hsa02010 ABCA10 ABC transporters
## 3 hsa02010 ABCA12 ABC transporters
## 4 hsa02010 ABCA13 ABC transporters
## 5 hsa02010 ABCA2 ABC transporters
## 6 hsa02010 ABCA3 ABC transporters
kegg1 = split(KEGG_df$gene_symbol,KEGG_df$gs_exact_source)
lapply(kegg1[1:6],head)
## $hsa00010
## [1] "ACSS1" "ACSS2" "ADH1A" "ADH1B" "ADH1C" "ADH4"
##
## $hsa00020
## [1] "ACLY" "ACO1" "ACO2" "CS" "DLAT" "DLD"
##
## $hsa00030
## [1] "ALDOA" "ALDOB" "ALDOC" "DERA" "FBP1" "FBP2"
##
## $hsa00040
## [1] "AKR1B1" "CRYL1" "DCXR" "DHDH" "GUSB" "RPE"
##
## $hsa00051
## [1] "AKR1B1" "AKR1B10" "ALDOA" "ALDOB" "ALDOC" "FBP1"
##
## $hsa00052
## [1] "AKR1B1" "B4GALT1" "B4GALT2" "G6PC" "G6PC2" "GAA"
2.通路ID與基因之間的對(duì)應(yīng)關(guān)系
在org.Hs.eg.db包里有:
library(clusterProfiler)
library(org.Hs.eg.db)
kegg <- org.Hs.egPATH2EG
mapped <- mappedkeys(kegg)
kegg2 <- as.list(kegg[mapped])
lapply(kegg2[1:6],head)
## $`04610`
## [1] "2" "462" "623" "624" "629" "710"
##
## $`00232`
## [1] "9" "10" "1544" "1548" "1549" "1553"
##
## $`00983`
## [1] "9" "10" "978" "1066" "1548" "1549"
##
## $`01100`
## [1] "9" "10" "15" "18" "28" "30"
##
## $`00380`
## [1] "15" "26" "38" "39" "217" "219"
##
## $`00970`
## [1] "16" "833" "1615" "2058" "2193" "2617"
看起來像一堆密碼仿野?這個(gè)列表铣减,名字是通路的id,只是省略了hsa脚作,內(nèi)容是基因的entrizid葫哗。
舉個(gè)栗子,提取hsa03030里的基因,并且轉(zhuǎn)換成symbol球涛。
genes = unlist(kegg2["03030"])
length(genes)
## [1] 36
#想讓他變成symbol直接bitr即可
genes = bitr(genes,
fromType = "ENTREZID",
toType = "SYMBOL",
OrgDb = "org.Hs.eg.db")$SYMBOL
head(genes)
## [1] "DNA2" "FEN1" "LIG1" "MCM2" "MCM3" "MCM4"