R語言獲取 GO/KEGG 中某一通路的基因集

有時(shí)候我們想看某一特定通路的富集情況纹磺,這時(shí)就需要獲取該通路的基因集。

一钮惠、從 GO 中獲取

下載腳本 getGoTerm.R茅糜,獲取所有的 GO geneSet,然后保存為RData素挽,因?yàn)?get_GO_data 這一步非常耗時(shí)蔑赘。

source("getGoTerm.R")
GO_DATA <- get_GO_data("org.Hs.eg.db", "ALL", "SYMBOL")
save(GO_DATA, file = "GO_DATA.RData")

然后,我寫了兩個(gè)小函數(shù) findGOgetGO 分別用于尋找 GO ID 和獲取 GO geneSet。

findGO <- function(pattern, method = "key"){

    if(!exists("GO_DATA"))
        load("GO_DATA.RData")
    if(method == "key"){
        pathways = cbind(GO_DATA$PATHID2NAME[grep(pattern, GO_DATA$PATHID2NAME)])
    } else if(method == "gene"){
        pathways = cbind(GO_DATA$PATHID2NAME[GO_DATA$EXTID2PATHID[[pattern]]])
    }

    colnames(pathways) = "pathway"

    if(length(pathways) == 0){
        cat("No results!\n")
    } else{
        return(pathways)
    }
}

getGO <- function(ID){

    if(!exists("GO_DATA"))
        load("GO_DATA.RData")
    allNAME = names(GO_DATA$PATHID2EXTID)
    if(ID %in% allNAME){
        geneSet = GO_DATA$PATHID2EXTID[ID]
        names(geneSet) = GO_DATA$PATHID2NAME[ID]
        return(geneSet)     
    } else{
        cat("No results!\n")
    }
}

用法示例

load("GO_DATA.RData") # 載入數(shù)據(jù) GO_DATA
findGO("insulin") # 尋找 含有指定關(guān)鍵字的 pathway name 的 pathway
findGO("INS", method = "gene") # 尋找含有指定基因名的 pathway

getGO("GO:1901142") # 獲取指定 GO ID 的 gene set
$`insulin metabolic process`
[1] "CEACAM1" "CPE"     "ERN1"    "IDE"     "PCSK2"   "ERO1B"

二缩赛、從KEGG 中獲取

1. 獲取目標(biāo)通路的ID號(hào)

以人類糖尿病相關(guān)通路為例耙箍,訪問 https://www.kegg.jp/kegg/pathway.html ,在 Select prefix 中輸入人類的縮寫 hsa酥馍,Enter keywords 中輸入關(guān)鍵詞 diabetes辩昆,檢索到了8個(gè)相關(guān)通路,點(diǎn)進(jìn)第一個(gè) Type II diabetes mellitus旨袒,可以查看詳細(xì)信息汁针,ID號(hào)為 hsa04930

2. 開始獲取
# 方法一
if(!requireNamespace("BiocManager", quietly = TRUE))      
   install.packages("BiocManager") 
BiocManager::install("KEGGREST", version = "3.10")

library("KEGGREST")
gsInfo = keggGet('hsa04930')[[1]]
names(gsInfo)

geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1])
geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
names(geneSet) = gsInfo$NAME
> geneSet
$`Type II diabetes mellitus - Homo sapiens (human)`
 [1] "INS"     "INSR"    "IRS1"    "IRS2"    "IRS4"    "PIK3R1"  "PIK3R2"
 [8] "PIK3R3"  "PIK3CA"  "PIK3CD"  "PIK3CB"  "SLC2A4"  "ADIPOQ"  "MAPK1"
[15] "MAPK3"   "MTOR"    "PRKCZ"   "SOCS1"   "SOCS2"   "SOCS3"   "SOCS4"
[22] "IKBKB"   "MAPK8"   "MAPK10"  "MAPK9"   "TNF"     "PRKCD"   "PRKCE"
[29] "PDX1"    "MAFA"    "SLC2A2"  "HK3"     "HK1"     "HK2"     "HKDC1"
[36] "GCK"     "PKM"     "PKLR"    "KCNJ11"  "ABCC8"   "CACNA1C" "CACNA1D"
[43] "CACNA1A" "CACNA1B" "CACNA1E" "CACNA1G"

# 方法二
library("rjson")
download.file("http://togows.dbcls.jp/entry/pathway/hsa04930/genes.json", "hsa04930.json")
json = fromJSON(file ="hsa04930.json")
geneSet = list(as.character(sapply(json[[1]], function(x) sapply(strsplit(x[1], ";"), function(x) x[1]))))
> geneSet
[[1]]
 [1] "INS"     "INSR"    "IRS1"    "IRS2"    "IRS4"    "PIK3R1"  "PIK3R2"
 [8] "PIK3R3"  "PIK3CA"  "PIK3CD"  "PIK3CB"  "SLC2A4"  "ADIPOQ"  "MAPK1"
[15] "MAPK3"   "MTOR"    "PRKCZ"   "SOCS1"   "SOCS2"   "SOCS3"   "SOCS4"
[22] "IKBKB"   "MAPK8"   "MAPK10"  "MAPK9"   "TNF"     "PRKCD"   "PRKCE"
[29] "PDX1"    "MAFA"    "SLC2A2"  "HK3"     "HK1"     "HK2"     "HKDC1"
[36] "GCK"     "PKM"     "PKLR"    "KCNJ11"  "ABCC8"   "CACNA1C" "CACNA1D"
[43] "CACNA1A" "CACNA1B" "CACNA1E" "CACNA1G"

不過第二種方法不能獲取geneSet的 NAME砚尽,因此更推薦第一種方法施无。

同樣將以上步驟包裝為函數(shù):

getKEGG <- function(ID){

    library("KEGGREST")

    gsList = list()
    for(xID in ID){

        gsInfo = keggGet(xID)[[1]]
        if(!is.null(gsInfo$GENE)){
            geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1])   
            xgeneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])          
            NAME = sapply(strsplit(gsInfo$NAME, " - "), function(x) x[1])
            names(xgeneSet) = NAME
            gsList[NAME] = xgeneSet 
        } else{
            cat(" ", xID, "No corresponding gene set in specific database.\n")
        }
    }
    return(gsList)
}

> getKEGG('hsa04930')
$`Type II diabetes mellitus - Homo sapiens (human)`
 [1] "INS"     "INSR"    "IRS1"    "IRS2"    "IRS4"    "PIK3R1"  "PIK3R2"
 [8] "PIK3R3"  "PIK3CA"  "PIK3CD"  "PIK3CB"  "SLC2A4"  "ADIPOQ"  "MAPK1"
[15] "MAPK3"   "MTOR"    "PRKCZ"   "SOCS1"   "SOCS2"   "SOCS3"   "SOCS4"
[22] "IKBKB"   "MAPK8"   "MAPK10"  "MAPK9"   "TNF"     "PRKCD"   "PRKCE"
[29] "PDX1"    "MAFA"    "SLC2A2"  "HK3"     "HK1"     "HK2"     "HKDC1"
[36] "GCK"     "PKM"     "PKLR"    "KCNJ11"  "ABCC8"   "CACNA1C" "CACNA1D"
[43] "CACNA1A" "CACNA1B" "CACNA1E" "CACNA1G"

> getKEGG(c("hsa04930", "hsa04940", "hsa04950"))
$`Type II diabetes mellitus`
 [1] "INS"     "INSR"    "IRS1"    "IRS2"    "IRS4"    "PIK3R1"  "PIK3R2"
 [8] "PIK3R3"  "PIK3CA"  "PIK3CD"  "PIK3CB"  "SLC2A4"  "ADIPOQ"  "MAPK1"
[15] "MAPK3"   "MTOR"    "PRKCZ"   "SOCS1"   "SOCS2"   "SOCS3"   "SOCS4"
[22] "IKBKB"   "MAPK8"   "MAPK10"  "MAPK9"   "TNF"     "PRKCD"   "PRKCE"
[29] "PDX1"    "MAFA"    "SLC2A2"  "HK3"     "HK1"     "HK2"     "HKDC1"
[36] "GCK"     "PKM"     "PKLR"    "KCNJ11"  "ABCC8"   "CACNA1C" "CACNA1D"
[43] "CACNA1A" "CACNA1B" "CACNA1E" "CACNA1G"

$`Type I diabetes mellitus`
 [1] "INS"      "GAD1"     "GAD2"     "PTPRN"    "PTPRN2"   "CPE"
 [7] "HSPD1"    "ICA1"     "HLA-DMA"  "HLA-DMB"  "HLA-DOA"  "HLA-DOB"
[13] "HLA-DPA1" "HLA-DPB1" "HLA-DQA1" "HLA-DQA2" "HLA-DQB1" "HLA-DRA"
[19] "HLA-DRB1" "HLA-DRB3" "HLA-DRB4" "HLA-DRB5" "CD80"     "CD86"
[25] "CD28"     "IL12A"    "IL12B"    "IL2"      "IFNG"     "HLA-A"
[31] "HLA-B"    "HLA-C"    "HLA-F"    "HLA-G"    "HLA-E"    "FASLG"
[37] "FAS"      "PRF1"     "GZMB"     "LTA"      "TNF"      "IL1A"
[43] "IL1B"

$`Maturity onset diabetes of the young`
 [1] "HHEX"    "MNX1"    "ONECUT1" "PDX1"    "NR5A2"   "NEUROG3" "NKX2-2"
 [8] "NKX6-1"  "PAX6"    "PAX4"    "NEUROD1" "RFX6"    "HES1"    "HNF1B"
[15] "FOXA2"   "MAFA"    "HNF4A"   "HNF1A"   "HNF4G"   "FOXA3"   "PKLR"
[22] "SLC2A2"  "INS"     "IAPP"    "GCK"     "BHLHA15"

Ref

用clusterProfiler做GSEA(一)

從KEGG的pathway中提取gene symbol

R-下載某一條通路的所有基因名字(KEGG)

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市必孤,隨后出現(xiàn)的幾起案子猾骡,更是在濱河造成了極大的恐慌,老刑警劉巖敷搪,帶你破解...
    沈念sama閱讀 206,839評(píng)論 6 482
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件兴想,死亡現(xiàn)場離奇詭異,居然都是意外死亡赡勘,警方通過查閱死者的電腦和手機(jī)嫂便,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,543評(píng)論 2 382
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來狮含,“玉大人顽悼,你說我怎么就攤上這事〖钙” “怎么了蔚龙?”我有些...
    開封第一講書人閱讀 153,116評(píng)論 0 344
  • 文/不壞的土叔 我叫張陵,是天一觀的道長映胁。 經(jīng)常有香客問我木羹,道長,這世上最難降的妖魔是什么解孙? 我笑而不...
    開封第一講書人閱讀 55,371評(píng)論 1 279
  • 正文 為了忘掉前任坑填,我火速辦了婚禮,結(jié)果婚禮上弛姜,老公的妹妹穿的比我還像新娘脐瑰。我一直安慰自己,他們只是感情好廷臼,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,384評(píng)論 5 374
  • 文/花漫 我一把揭開白布苍在。 她就那樣靜靜地躺著你踩,像睡著了一般忘朝。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上朦促,一...
    開封第一講書人閱讀 49,111評(píng)論 1 285
  • 那天滑废,我揣著相機(jī)與錄音举娩,去河邊找鬼厢钧。 笑死荆责,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的牙咏。 我是一名探鬼主播臼隔,決...
    沈念sama閱讀 38,416評(píng)論 3 400
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢(mèng)啊……” “哼眠寿!你這毒婦竟也來了躬翁?” 一聲冷哼從身側(cè)響起焦蘑,我...
    開封第一講書人閱讀 37,053評(píng)論 0 259
  • 序言:老撾萬榮一對(duì)情侶失蹤盯拱,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后例嘱,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體狡逢,經(jīng)...
    沈念sama閱讀 43,558評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,007評(píng)論 2 325
  • 正文 我和宋清朗相戀三年拼卵,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了奢浑。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 38,117評(píng)論 1 334
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡腋腮,死狀恐怖雀彼,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情即寡,我是刑警寧澤徊哑,帶...
    沈念sama閱讀 33,756評(píng)論 4 324
  • 正文 年R本政府宣布,位于F島的核電站聪富,受9級(jí)特大地震影響莺丑,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜墩蔓,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,324評(píng)論 3 307
  • 文/蒙蒙 一梢莽、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧奸披,春花似錦昏名、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,315評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽份殿。三九已至,卻和暖如春嗽交,著一層夾襖步出監(jiān)牢的瞬間卿嘲,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,539評(píng)論 1 262
  • 我被黑心中介騙來泰國打工夫壁, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留拾枣,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 45,578評(píng)論 2 355
  • 正文 我出身青樓盒让,卻偏偏與公主長得像梅肤,于是被迫代替她去往敵國和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子邑茄,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,877評(píng)論 2 345

推薦閱讀更多精彩內(nèi)容