ClusterProfiler作為最經(jīng)典、最重要的功能富集軟件菇肃,可以對指定基因集合進行KEGG地粪、GO功能富集。安裝方法非常簡單:
if(!require("BiocManager")){install.packages("BiocManager")}
if(!require("clusterProfiler")){BiocManager::install("clusterProfiler")}
獲得基因集合即可進行KEGG/GO功能富集分析琐谤。有時需要做一下基因ID轉(zhuǎn)換
gene <- bitr(gene,
fromType="SYMBOL",
toType="ENTREZID",
annoDb="org.Hs.eg.db")[,1]
一蟆技、物種為人時,用如下命令進行富集分析
KEGGenrich <- enrichKEGG(gene = gene,
organism = "hsa",
pvalueCutoff =0.05,
qvalueCutoff =0.05)
GOenrich <- enrichGO(gene = gene,
OrgDb = org.Hs.eg.db,
keyType = 'ENTREZID',
pvalueCutoff =0.05,
qvalueCutoff =0.05)
二斗忌、小鼠基因功能富集
KEGGenrich <- enrichKEGG(gene = gene,
organism = "mmu",
pvalueCutoff =0.05,
qvalueCutoff =0.05)
GOenrich <- enrichGO(gene = gene,
OrgDb = org.Mm.eg.db,
keyType = 'ENTREZID',
pvalueCutoff =0.05,
qvalueCutoff =0.05)
三质礼、常見物種,需要確認物種是否有OrgDB包织阳,有的話需要安裝相應(yīng)R包后進行富集分析眶蕉。
常見物種對應(yīng)表
物種 | Species | OrgDB |
---|---|---|
按蚊 | Anopheles | org.Ag.eg.db |
擬南芥 | Arabidopsis | org.At.tair.db |
牛 | Brovine | org.Bt.eg.db |
蠕蟲 | Worm | org.Ce.eg.db |
犬 | Canine | org.Cf.eg.db |
蒼蠅 | Fly | org.Dm.eg.db |
斑馬魚 | Zebrafish | org.Dr.eg.db |
大腸桿菌 strain K12 | E coli strain K12 | org.EcK12.eg.db |
大腸桿菌 strain Sakai | E coli strain Sakai | org.EcSakai.eg.db |
雞 | Chicken | org.Gg.eg.db |
人 | Human | org.Hs.eg.db |
小鼠 | Mouse | org.Mm.eg.db |
恒河猴 | Rhesus | org.Mmu.eg.db |
瘧原蟲 | Malaria | org.Pf.plasmo.db |
黑猩猩 | Chimp | org.Pt.eg.db |
大鼠,褐家鼠 | Rat | org.Rn.eg.db |
酵母 | Yeast | org.Sc.sgd.db |
豬 | Pig | org.Ss.eg.db |
爪蟾 | Xenopus | org.Xl.eg.db |
備注:數(shù)據(jù)來源:http://www.reibang.com/p/84e70566a6c6
四唧躲、無OrgDB物種
常見物種已有OrgDB造挽,本物種沒有對應(yīng)OrgDB的話,需要自己構(gòu)建OrgDB? 不會構(gòu)建怎么辦弄痹?
那總要做點什么饭入!
是的,需要準(zhǔn)備一份對應(yīng)物種的KEGG通路和基因列表肛真,如下圖所示:
Genesnames | TermNAME | TermID | dbType | TermName | curl | ko |
---|---|---|---|---|---|---|
HK2 | KEGG_GLYCOLYSIS_GLUCONEOGENESIS | chx00010 | KEGG | Glycolysis / Gluconeogenesis | https://www.kegg.jp/pathway/chx00010 | K00844 |
HK3 | KEGG_GLYCOLYSIS_GLUCONEOGENESIS | chx00010 | KEGG | Glycolysis / Gluconeogenesis | https://www.kegg.jp/pathway/chx00010 | K00844 |
HK1 | KEGG_GLYCOLYSIS_GLUCONEOGENESIS | chx00010 | KEGG | Glycolysis / Gluconeogenesis | https://www.kegg.jp/pathway/chx00010 | K00844 |
HKDC1 | KEGG_GLYCOLYSIS_GLUCONEOGENESIS | chx00010 | KEGG | Glycolysis / Gluconeogenesis | https://www.kegg.jp/pathway/chx00010 | K00844 |
GCK | KEGG_GLYCOLYSIS_GLUCONEOGENESIS | chx00010 | KEGG | Glycolysis / Gluconeogenesis | https://www.kegg.jp/pathway/chx00010 | K12407 |
讀取KEGG數(shù)據(jù)庫文件
KEGGdb <- read.table('KEGG.symbols.txt',header = FALSE,sep='\t')
KEGGdb <- c('Genesnames','TermNAME','TermID','dbType','TermName','curl')
之后利用enricher功能進行富集分析谐丢!由于數(shù)據(jù)庫直接利用基因名和通路的對應(yīng)關(guān)系。因此蚓让,可以直接利用基因名進行富集乾忱。
KEGGenrich <- enricher(gene = genes,
pAdjustMethod = 'BH',
qvalueCutoff = 0.05,
TERM2GENE = KEGGdb[,c('TermID','Genesnames')],
TERM2NAME = KEGGdb[,c('TermID','TermName')]
)
GO功能注釋需要一份和KEGG類似的數(shù)據(jù)庫文件,
GOdb <- read.table('GO.symbols.txt',header = FALSE,sep='\t')
GOdb <- c('Genesnames','TermNAME','TermID','dbType','TermName','curl')
GOenrich <- enricher(gene = genes,
pAdjustMethod = 'BH',
qvalueCutoff = 0.05,
TERM2GENE = GOdb[,c('TermID','Genesnames')],
TERM2NAME = GOdb[,c('TermID','TermName')]
)
得到的結(jié)果和enrichKEGG/enrichGO一樣的可以用于后續(xù)分析和繪圖历极。
五窄瘟、那如何獲得對應(yīng)物種KEGG和GO數(shù)據(jù)庫?
1执解、直接從KEGG/GO數(shù)據(jù)庫官網(wǎng)下載獲饶ぁ;
2衰腌、聯(lián)系我們獲取對應(yīng)KEGG/GO數(shù)據(jù)庫文件新蟆;