最近發(fā)現(xiàn)了python版的MCScan,是個大寶藏割坠。由于走了不少彎路乾闰,終于畫出美圖,趕緊記錄下來
github地址 https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)
1叠赐、軟件安裝
需要安裝LASTAL和jcvi python包
sudo apt install last-align
pip install jcvi
2、輸入數(shù)據(jù)
輸入數(shù)據(jù)只有兩類cds和bed文件
可以自動從phytozome屡江,這點十分方便
$ python -m jcvi.apps.fetch phytozome
...
Acoerulea Alyrata Athaliana
Bdistachyon Brapa Cclementina
Cpapaya Creinhardtii Crubella
Csativus Csinensis Csubellipsoidea_C-169
Egrandis Fvesca Gmax
Graimondii Lusitatissimum Mdomestica
Mesculenta Mguttatus Mpusilla_CCMP1545
Mpusilla_RCC299 Mtruncatula Olucimarinus
Osativa Ppatens Ppersica
Ptrichocarpa Pvirgatum Pvulgaris
Rcommunis Sbicolor Sitalica
Slycopersicum Smoellendorffii Stuberosum
Tcacao Thalophila Vcarteri
Vvinifera Zmays early_release
以水稻和擬南芥為例
$ python -m jcvi.apps.fetch phytozome Osativa,Athaliana
$ ls
Athaliana_167_cds.fa.gz Athaliana_167_gene.gff3.gz Osativa_204_cds.fa.gz Osativa_204_gene.gff3.gz
其中g(shù)ff3文件不需要解壓 一鍵轉(zhuǎn)換成bed格式
python -m jcvi.formats.gff bed --type=mRNA --key=Name Osativa_204_gene.gff3.gz -o osa.bed
cds解壓后需要去掉|分隔符 b并要修改id 以基因而不是轉(zhuǎn)錄本命名
$ gunzip Athaliana_167_cds.fa.gz
$ mv Athaliana_167_cds.fa ath.cds
$ sed 's/\.*$//g' -i ath.cds #也可以這么做 python -m jcvi.formats.fasta format --sep="|" Athaliana_167_cds.fa.gz ath.cds
$ sed 's/\.//g' -i ath.cds
如果是其他物種或者自己組裝的基因組數(shù)據(jù)芭概,記得基因id需要遵循在染色體上的位置從大到小排序的命名原則,否則軟件會在gff3轉(zhuǎn)bed的時候自動命名惩嘉,務(wù)必要和cds里的id對應(yīng)罢洲。
3、Pairwise synteny 分析
$ python -m jcvi.compara.catalog ortholog osa ath
分析過程很快,結(jié)果包括.anchors文件惹苗,點陣圖殿较,如果遇到報錯,多半是要安裝python包桩蓉,更新Latex淋纲。結(jié)果文件的含義“The .last file is raw LAST output, .last.filtered is filtered LAST output, .anchors is the seed synteny blocks (high quality), .lifted.anchors recruits additional anchors to form the final synteny blocks.”
$ ls osa.ath.*
osa.ath.lifted.anchors osa.ath.anchors osa.ath.last.filtered osa.ath.last
4、可視化
重頭戲來了
a 共線性圖
首先生成.simple文件
python -m jcvi.compara.synteny screen --minspan=30 --simple osa.ath.anchors osa.ath.anchors.new
再編輯兩個配置文件seqids和layout
$ vi seqids #設(shè)置需要展示等染色體號
Chr1,Chr2,Chr3,Chr4,Chr5,Chr6,Chr7,Chr8,Chr9,Chr10,Chr11,Chr12 #osa
Chr1,Chr2,Chr3,Chr4,Chr5,Chr6,Chr7,Chr8,Chr9,Chr10,Chr11,Chr12 #ath
$ vi layout #設(shè)置顏色院究、長寬等
# y, xstart, xend, rotation, color, label, va, bed
.6, .1, .8, 0, , Osa, top, osa.bed
.4, .1, .8, 0, , Ath, top, ath.bed
# edges
e, 0, 1, osa.ath.anchors.simple
接下來就是見證奇跡的時刻
突出顯示
$ vi XXX.XXXanchors.simple
g*GSVIVT01012028001 GSVIVT01000604001 ppa011886m ppa008534m 392 +
GSVIVT01010441001 GSVIVT01000970001 ppa022891m ppa001358m 115 -
GSVIVT01000555001 GSVIVT01003228001 ppa002809m ppa010569m 359 +
...
$ python -m jcvi.graphics.karyotype seqids layout
$ vi layout
# y, xstart, xend, rotation, color, label, va, bed
.7, .1, .8, 15, , Grape, top, grape.bed
.5, .1, .8, 0, , Peach, top, peach.bed
.3, .1, .8, -15, , Cacao, bottom, cacao.bed
# edges
e, 0, 1, grape.peach.anchors.simple
e, 1, 2, peach.cacao.anchors.simple
$ vi seqids
chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19
scaffold_1,scaffold_2,scaffold_3,scaffold_4,scaffold_5,scaffold_6,scaffold_7,scaffold_8
scaffold_1,scaffold_2,scaffold_3,scaffold_4,scaffold_5,scaffold_6,scaffold_7,scaffold_8,scaffold_9,scaffold_10r
$ python -m jcvi.graphics.karyotype seqids layout
b dotplot
親測點陣圖是自動出來的,當(dāng)然也可以用命令行
$ python -m jcvi.graphics.dotplot osa.ath.anchors
查看synteny depth分布
python -m jcvi.compara.synteny depth --histogram osa.ath.anchors
anyway,先介紹到這里啦
更多請參考
基因組共線性工具MCScanX使用說明
基因組間共線性分析想學(xué)嗎?
無限個蔬胯!物種共線性分析結(jié)果可視化