GO富集出來(lái)的結(jié)果可以非常多,如果沒(méi)有一個(gè)層級(jí)顯示牧抽,將會(huì)非常眼花繚亂嘉熊。TopGO則通過(guò)DAG(Directed acyclic graph)解決了這個(gè)問(wèn)題。
TopGO的詳細(xì)信息扬舒,都可以在bioconductor中的TopGO頁(yè)面找到阐肤,網(wǎng)站中附有詳細(xì)說(shuō)明。下面代碼不涉及其他統(tǒng)計(jì)分析,只涉及出DAG圖所需要的的最簡(jiǎn)單的代碼孕惜。
安裝:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("topGO")
文件準(zhǔn)備:
-
所有基因列表愧薛。這個(gè)可以在excel中做成兩列數(shù)據(jù),這里是RNA-seq數(shù)據(jù)衫画,因此第一列為基因ID毫炉,第二列可以為0或者1,其中“0”表示非差異表達(dá)削罩,“1”表示差異表達(dá)瞄勾。如果是做芯片之類(lèi)的,也可以是基因分值或者p-value值弥激,具體視實(shí)驗(yàn)而定进陡。接著將內(nèi)容復(fù)制到txt文件中。將數(shù)據(jù)框先轉(zhuǎn)變成矩陣微服,再轉(zhuǎn)變成因子即可趾疚。
基因列表 - 差異基因列表。這個(gè)可以通過(guò)對(duì)“0”和“1”進(jìn)行篩選以蕴,或者直接對(duì)所有基因列表進(jìn)行排序糙麦,將差異基因,也就是“1”的基因排在前面舒裤,直接截取即可。
-
基因和GO對(duì)應(yīng)的關(guān)系列表觉吭。這個(gè)可以是Gene-GO,或者GO-Gene腾供。以GO-Gene為例,這個(gè)文件還是兩列鲜滩,第一列是GO terms伴鳖,第二列是用逗號(hào)分開(kāi)的Genes,如下圖:
GO-Gene
這個(gè)文件不能直接讀入徙硅,而是用readMappings
函數(shù)進(jìn)行解釋榜聂。TopGO里面有一個(gè)很有用的function,就是inverseList
嗓蘑,可以把Gene-GO和GO-Gene兩種文件相互轉(zhuǎn)換须肆。
代碼
setwd("D:/BioInfo/R/Package information/TopGO/Test")
library(topGO)
#01. read the GeneList file and convert it to 2 levels factor
GL = read.table("01.GL.txt")
GL1 = as.matrix(GL)
rownames(GL1) = GL1[,1]
GL2 = GL1[,2]
GL3 = as.factor(GL2)
#02. Directly generate DEGs,DEGs with value of "1" were at the top of the list.
DEG = GL3[1:200]
#03. parse the annotation file, "readMappings" function was used to parse both of the GO-Gene or Gene-GO files
GeneGO = readMappings("03.GeneGO.txt")
## inverse the direction
## Notably, use the readMappings function firstly, then inverseList function.
GOGene = inverseList(GeneGO)
#04. Generate GOdata, three parts should be run separately, including "BP","MF" and "CC"
GOdata = new("topGOdata",
ontology = "BP",
allGenes = GL3,
geneSelectionFun = DEG,
annot = annFUN.gene2GO,
nodeSize = 5,
gene2GO = GeneGO
)
#05. statistic
resultFis=runTest(GOdata,algorithm = "classic",statistic = "fisher")
#06. Visualization, firstSigNodes could be changed.
showSigOfNodes(GOdata,
score(resultFis),
firstSigNodes = 5,
useInfo = 'all')
#07. Print
printGraph(GOdata,
resultFis,
firstSigNodes = 5,
fn.prefix = "tGO",
useInfo = "all",
pdfSW = TRUE)
結(jié)果如下:
直接在Rstudio顯示不清晰桩皿,但輸出成EPS或者PDF就清晰了豌汇。
另外一個(gè)問(wèn)題就是默認(rèn)輸出全部信息的時(shí)候,GO terms太長(zhǎng)的會(huì)顯示不全泄隔,這個(gè)時(shí)候我們可以用AI打開(kāi)文件拒贱,并進(jìn)行相應(yīng)修改,再導(dǎo)出tiff文件即可。