這部分直接從上部分RNA-seq(9):富集分析(功能注釋)的數(shù)據(jù)而來(lái)晶框,當(dāng)然如果你上部分?jǐn)?shù)據(jù)存盤了,這部分直接導(dǎo)入并進(jìn)行轉(zhuǎn)換就可以趾盐。這里我們先用另外一個(gè)R包 gage package (Generally Applicable Gene-set Enrichment for Pathway Analysis)進(jìn)行KEGG 富集分析哥倔,這樣也可以和上部分進(jìn)行比較秸架。
提前說(shuō)明幾個(gè)問(wèn)題
-
kegg的物種縮寫在這里查看
-
我們使用 gage package (Generally Applicable Gene-set Enrichment for Pathway Analysis) 進(jìn)行通路分析。點(diǎn)擊下載 gage package workflow vignette for RNA-seq pathway analysis查看gage包工作流程咆蒿。一旦有了富集的通路list东抹,就可以使用pathview 包進(jìn)行通路可視化。當(dāng)然這會(huì)用到上下調(diào)信息沃测。
-
用pathview進(jìn)行可視化
先安裝R包
source("https://bioconductor.org/biocLite.R")
biocLite("gage")
biocLite("pathview")
biocLite("gageData")
library("pathview")
library("gage")
library("gageData")
install.packages("dplyr")
library("dplyr")
#library(clusterProfiler)
#library(DOSE)
#library(stringr)
#library(org.Mm.eg.db)
加載數(shù)據(jù)
data(kegg.sets.mm)
data(sigmet.idx.mm)
kegg.sets.mm = kegg.sets.mm[sigmet.idx.mm]
head(kegg.sets.mm,3)
setwd("F:/rna_seq/data/matrix")
sig.gene<-read.csv(file="DEG_treat_vs_control.csv")
gene.df<-bitr(gene, fromType = "ENSEMBL",
toType = c("SYMBOL","ENTREZID"),
OrgDb = org.Mm.eg.db)
head(sig.gene)
> head(sig.gene)
X baseMean log2FoldChange lfcSE stat pvalue padj
1 ENSMUSG00000003309 548.1926 3.231611 0.2658125 12.157485 5.234568e-34 8.193146e-30
2 ENSMUSG00000046323 404.1894 3.067050 0.2628220 11.669687 1.820923e-31 1.425055e-27
3 ENSMUSG00000001123 341.8542 2.797485 0.2766499 10.112004 4.887441e-24 2.549941e-20
4 ENSMUSG00000023906 951.9460 2.382307 0.2510718 9.488551 2.342684e-21 9.116395e-18
5 ENSMUSG00000018569 485.4839 3.136031 0.3312999 9.465836 2.912214e-21 9.116395e-18
6 ENSMUSG00000000184 601.0842 -2.827750 0.3154171 -8.965112 3.099648e-19 8.085948e-16
開(kāi)始用gage包進(jìn)行富集分析缭黔,gage()
函數(shù)需要fold change 和Entrez gene IDs
foldchanges = sig.gene$log2FoldChange
names(foldchanges)= gene.df$ENTREZID
head(foldchanges)
如下顯示:
> head(foldchanges)
11768 73708 16859 54419 53624 12444
3.231611 3.067050 2.797485 2.382307 3.136031 -2.827750
開(kāi)始pathway分析,獲取結(jié)果
keggres = gage(foldchanges, gsets = kegg.sets.mm, same.dir = TRUE)
# Look at both up (greater), down (less), and statatistics.
lapply(keggres, head)
顯示為
> lapply(keggres, head)
$greater
p.geomean stat.mean p.val q.val set.size exp1
mmu04514 Cell adhesion molecules (CAMs) 0.2680462 0.6286461 0.2680462 0.5360924 12 0.2680462
mmu04510 Focal adhesion 0.6382502 -0.3594187 0.6382502 0.6382502 10 0.6382502
mmu04144 Endocytosis NA NaN NA NA 8 NA
mmu03008 Ribosome biogenesis in eukaryotes NA NaN NA NA 0 NA
mmu04141 Protein processing in endoplasmic reticulum NA NaN NA NA 0 NA
mmu04740 Olfactory transduction NA NaN NA NA 1 NA
$less
p.geomean stat.mean p.val q.val set.size exp1
mmu04510 Focal adhesion 0.3617498 -0.3594187 0.3617498 0.7234996 10 0.3617498
mmu04514 Cell adhesion molecules (CAMs) 0.7319538 0.6286461 0.7319538 0.7319538 12 0.7319538
mmu04144 Endocytosis NA NaN NA NA 8 NA
mmu03008 Ribosome biogenesis in eukaryotes NA NaN NA NA 0 NA
mmu04141 Protein processing in endoplasmic reticulum NA NaN NA NA 0 NA
mmu04740 Olfactory transduction NA NaN NA NA 1 NA
$stats
stat.mean exp1
mmu04514 Cell adhesion molecules (CAMs) 0.6286461 0.6286461
mmu04510 Focal adhesion -0.3594187 -0.3594187
mmu04144 Endocytosis NaN NA
mmu03008 Ribosome biogenesis in eukaryotes NaN NA
mmu04141 Protein processing in endoplasmic reticulum NaN NA
mmu04740 Olfactory transduction NaN NA
得到pathway
keggrespathways = data.frame(id=rownames(keggres$greater), keggres$greater) %>%
tbl_df() %>%
filter(row_number()<=10) %>%
.$id %>%
as.character()
keggrespathways
結(jié)果如下:
> keggrespathways
[1] "mmu04514 Cell adhesion molecules (CAMs)" "mmu04510 Focal adhesion"
[3] "mmu04144 Endocytosis" "mmu03008 Ribosome biogenesis in eukaryotes"
[5] "mmu04141 Protein processing in endoplasmic reticulum" "mmu04740 Olfactory transduction"
[7] "mmu03010 Ribosome" "mmu04622 RIG-I-like receptor signaling pathway"
[9] "mmu04744 Phototransduction" "mmu04062 Chemokine signaling pathway"
# Get the IDs.
keggresids = substr(keggrespathways, start=1, stop=8)
keggresids
> keggresids
[1] "mmu04514" "mmu04510" "mmu04144" "mmu03008" "mmu04141" "mmu04740" "mmu03010" "mmu04622" "mmu04744" "mmu04062"
最后蒂破,可以通過(guò)pathview包中的pathway()函數(shù)畫圖试浙。下面寫一個(gè)函數(shù),這樣好循環(huán)畫出上面產(chǎn)生的前10個(gè)通路圖寞蚌。
# 先定義畫圖函數(shù)
plot_pathway = function(pid) pathview(gene.data=foldchanges, pathway.id=pid, species="mmu", new.signature=FALSE)
# 同時(shí)畫多個(gè)pathways田巴,這些plots自動(dòng)存到工作目錄
tmp = sapply(keggresids, function(pid) pathview(gene.data=foldchanges, pathway.id=pid, species="mmu"))
顯示如下
> tmp = sapply(keggresids, function(pid) pathview(gene.data=foldchanges, pathway.id=pid, species="mmu"))
Info: Downloading xml files for mmu04514, 1/1 pathways..
Info: Downloading png files for mmu04514, 1/1 pathways..
'select()' returned 1:1 mapping between keys and columns
Info: Working in directory F:/rna_seq/data/matrix
Info: Writing image file mmu04514.pathview.png
Info: Downloading xml files for mmu04510, 1/1 pathways..
Info: Downloading png files for mmu04510, 1/1 pathways..
'select()' returned 1:1 mapping between keys and columns
然后我們?nèi)スぷ髂夸洠榭碖EGG pathway挟秤,我放三張圖查看下: