1.使用clusterProfiler包進(jìn)行GO富集分析
使用clusterProfiler的enrichGO函數(shù)來(lái)獲取GO分析
gene_id<-read.csv("SFTSV_24vscontrol_DE_mRNA.csv",header=T,stringsAsFactors = F)[,2]
library(clusterProfiler)
library(org.Hs.eg.db)
GO<-enrichGO(gene=gene_id,OrgDb = "org.Hs.eg.db",keyType = "ENSEMBL",ont="ALL",qvalueCutoff = 0.05,readable = T)
gene就是差異基因?qū)?yīng)的向量,keyType指定基因ID的類(lèi)型凸丸,默認(rèn)為ENTREZID, 該參數(shù)的取值可以參考keytypes(org.Hs.eg.db)的結(jié)果噪生, 建議采用ENTREZID, OrgDb指定該物種對(duì)應(yīng)的org包的名字忧侧,ont代表GO的3大類(lèi)別博其,BP, CC, MF尝盼,也可以選擇ALL; pAdjustMethod指定多重假設(shè)檢驗(yàn)矯正的方法模蜡,這里默認(rèn)pAdjustMethod="BH",所以這里沒(méi)有寫(xiě)出來(lái),cutoff指定對(duì)應(yīng)的閾值漠趁,readable=TRUE代表將基因ID轉(zhuǎn)換為gene symbol。
go<-as.data.frame(GO)
View(go)
table(go[,1]) #查看BP,CC,MF的統(tǒng)計(jì)數(shù)目
image.png
image.png
image.png
可以看到GO富集分析條目一共有769條屬于BP,51條屬于CC,32條屬于MF
2.ggplot2繪制GO分析條目圖
1.按照qvalue升序排序忍疾,分別選出前10個(gè)BP,CC,MF的條目闯传,由于enrichGO函數(shù)生成的數(shù)據(jù)框默認(rèn)是按照qvalue升序排序,所以這里我們只用選取前十個(gè)就行了
go_MF<-go[go$ONTOLOGY=="MF",][1:10,]
go_CC<-go[go$ONTOLOGY=="CC",][1:10,]
go_BP<-go[go$ONTOLOGY=="BP",][1:10,]
go_enrich_df<-data.frame(ID=c(go_BP$ID, go_CC$ID, go_MF$ID),
Description=c(go_BP$Description, go_CC$Description, go_MF$Description),
GeneNumber=c(go_BP$Count, go_CC$Count, go_MF$Count),
type=factor(c(rep("biological process", 10), rep("cellular component", 10),rep("molecular function",10)),levels=c("molecular function", "cellular component", "biological process")))
image.png
如上圖為數(shù)據(jù)框go_enrich_df
2.ggplot2畫(huà)圖
## numbers as data on x axis
go_enrich_df$number <- factor(rev(1:nrow(go_enrich_df)))
## shorten the names of GO terms
shorten_names <- function(x, n_word=4, n_char=40){
if (length(strsplit(x, " ")[[1]]) > n_word || (nchar(x) > 40))
{
if (nchar(x) > 40) x <- substr(x, 1, 40)
x <- paste(paste(strsplit(x, " ")[[1]][1:min(length(strsplit(x," ")[[1]]), n_word)],
collapse=" "), "...", sep="")
return(x)
}
else
{
return(x)
}
}
labels=(sapply(
levels(go_enrich_df$Description)[as.numeric(go_enrich_df$Description)],
shorten_names))
names(labels) = rev(1:nrow(go_enrich_df))
## colors for bar // green, blue, orange
CPCOLS <- c("#8DA1CB", "#FD8D62", "#66C3A5")
library(ggplot2)
p <- ggplot(data=go_enrich_df, aes(x=number, y=GeneNumber, fill=type)) +
geom_bar(stat="identity", width=0.8) + coord_flip() +
scale_fill_manual(values = CPCOLS) + theme_test() +
scale_x_discrete(labels=labels) +
xlab("GO term") +
theme(axis.text=element_text(face = "bold", color="gray50")) +
labs(title = "The Most Enriched GO Terms")
#coord_flip(...)橫向轉(zhuǎn)換坐標(biāo):把x軸和y軸互換卤妒,沒(méi)有特殊參數(shù)
image.png