論文
GENESPACE: syntenic pan-genome annotations for eukaryotes
https://www.biorxiv.org/content/10.1101/2022.03.09.483468v1
還沒(méi)有發(fā)表
github主頁(yè)
https://github.com/jtlovell/GENESPACE
詳細(xì)介紹
windows系統(tǒng)還不能用 只能在MacOS或者在Linux系統(tǒng)下使用,我試試在linux下使用
首先安裝orthofinder
conda install -c bioconda orthofinder
安裝MCScanX
https://github.com/wyp1125/MCScanX
git clone https://github.com/wyp1125/MCScanX.git
cd MCScanX
make
這里出現(xiàn)了三個(gè)error,但是也出現(xiàn)了三個(gè)可執(zhí)行程序撮胧,試了一下可以運(yùn)行叔锐,不知道后面會(huì)不會(huì)有影響
安裝依賴的R包
conda install r-data.table r-dbscan r-R.utils r-devtools
conda install bioconductor-Biostrings bioconductor-rtracklayer
安裝GENESPAE
# 啟動(dòng)R radian
devtools::install_github("jtlovell/GENESPACE", upgrade = F)
運(yùn)行示例數(shù)據(jù)
library(GENESPACE)
runwd<-file.path("./testGenespace/")
make_exampleDataDir(writeDir = runwd) ## 這一步會(huì)下載示例數(shù)據(jù)
gids<-c("human","chimp","rhesus")
gpar<-init_genespace(genomeIDs = gids,speciesIDs = gids,versionIDs = gids,ploidy = rep(1,3),wd = runwd,gffString = "gff",pepString = "pep",path2orthofinder = "orthofinder",path2mcscanx = "/home/myan/scratch/apps/mingyan/Biotools/MCScanX",path2diamond = "diamond",diamondMode = "fast",orthofinderMethod = "fast",rawGenomeDir = file.path(runwd,"rawGenomes"))
parse_annotations(gsParam = gpar,gffEntryType = "gene",gffIdColumn ="locus",gffStripText = "locus=",headerEntryIndex = 1,headerSep = " ",headerStripText = "locus=")
# 上面這行代碼沒(méi)有看懂是在干啥
gpar<-run_orthofinder(gsParam = gpar)
## 運(yùn)行這行代碼出現(xiàn)警告信息
Warning message:
In system2(gsParam$paths$orthofinderCall, com, stdout = TRUE, stderr = TRUE) :
running command ''orthofinder' -b ./testGenespace//orthofinder -t 4 -a 1 -X -og 2>&1' had status 120 and error message 'Interrupted system call'
## 不知道時(shí)候?qū)罄m(xù)有影響 有可能是 runwd<-file.path("./testGenespace/") 這行代碼最后多了一個(gè)斜線 重新運(yùn)行了一遍沒(méi)有問(wèn)題了
gpar<-synteny(gsParam = gpar)
## 畫(huà)圖展示
pdf(file="abc.pdf",width = 10,height = 8)
plot_riparianHits(gpar)
dev.off()
畫(huà)圖更多的參數(shù)
pdf(file="abc.pdf",width = 9.6,height = 4)
plot_riparianHits(gpar, refGenome = "chimp",invertTheseChrs = data.frame(genome = "rhesus", chr = 2),genomeIDs = c("chimp", "human", "rhesus"),labelTheseGenomes = c("chimp", "rhesus"),gapProp = .001,refChrCols = c("#BC4F43", "#F67243"),blackBg = FALSE,returnSourceData = T, verbose = F)
dev.off()
還可以自定義感興趣的區(qū)域
regs <- data.frame(genome = c("human", "human", "chimp", "rhesus"),chr = c(3, 3, 4, 5),start = c(0, 50e6, 0, 60e6),end = c(10e6, 70e6, 50e6, 90e6),cols = c("pink", "gold", "cyan", "dodgerblue"))
pdf(file = "abc2.pdf",width = 9.6,height = 4)
plot_riparianHits(gpar, onlyTheseRegions = regs,blackBg = FALSE)
dev.off()
構(gòu)建泛基因組組
pg <- pangenome(gpar)
輸出一個(gè)文件 results/human_pangenomeDB.txt.gz
打開(kāi)這個(gè)文件,部分結(jié)果如下
這個(gè)結(jié)果怎么看暫時(shí)沒(méi)看懂
幫助文檔里寫(xiě)道
This is the source data that can be manipulated programatically to extract your regions of interest. Future GENESPACE releases will have auxilary functions that let the user access the pan-genome by rules (e.g. contains these genes, in these regions etc.). For now, we’ll leave this work to scripting by the user.
接下來(lái)就是研究研究如何準(zhǔn)備自己的數(shù)據(jù)
歡迎大家關(guān)注我的公眾號(hào)
小明的數(shù)據(jù)分析筆記本
小明的數(shù)據(jù)分析筆記本 公眾號(hào) 主要分享:1置侍、R語(yǔ)言和python做數(shù)據(jù)分析和數(shù)據(jù)可視化的簡(jiǎn)單小例子冲簿;2晤柄、園藝植物相關(guān)轉(zhuǎn)錄組學(xué)擦剑、基因組學(xué)、群體遺傳學(xué)文獻(xiàn)閱讀筆記芥颈;3惠勒、生物信息學(xué)入門學(xué)習(xí)資料及自己的學(xué)習(xí)筆記!