最近發(fā)現(xiàn)了python版的MCScan,是個大寶藏。由于走了不少彎路,終于畫出美圖拘领,趕緊記錄下來。github地址 https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)
安裝
## 安裝lastal
網(wǎng)址:http://last.cbrc.jp
unzip last-1060.zip
cd last-1060
make
# 把scripts樱调, src 添加到環(huán)境變量
## jcvi
pip install jcvi
## 若出現(xiàn) from rillib.parse import urlparse 缺少parse模塊约素,則裝parse模塊,然后將urllib.parse 改為urlparse笆凌; 因為urlparse模塊在Python 3中重命名為urllib.parse圣猎,所以模塊在Python 2.7下應(yīng)該使用urlparse。
簡單使用
- 所需基因組的gff文件
- 所需基因組的pep或者cds序列
輸入數(shù)據(jù)處理
將gff轉(zhuǎn)變?yōu)閎ed格式
## 以spinach菩颖,和sugar為例子
python -m jcvi.formats.gff bed --type=mRNA --key=ID spinach_gene_v1.gff3 -o spinach.bed
python -m jcvi.formats.gff bed --type=mRNA --key=ID BeetSet-2.unfiltered_genes.1408.gff3.txt -o sugar.bed
##參數(shù)
--type:gff文件中第三列
--key:type對應(yīng)的第九列信息前綴
我們分析只需要用到每個基因最長的轉(zhuǎn)錄本就行样漆,在sugar中是以多個轉(zhuǎn)錄本進行存儲,因為需要獲取最長轉(zhuǎn)錄本
## 將sugar中bed進行去重復(fù)
python -m jcvi.formats.bed uniq sugar.bed
獲取對應(yīng)cds/pep序列
## cds和pep序列均可以進行共線性分析
## 根據(jù)上述得到的.bed文件調(diào)取對應(yīng)cds和蛋白序列
# spinach
seqkit grep -f <(cut -f4 spianch.bed) spinach.cds.fa | seqkit seq -I >spinach.cds
seqkit grep -f <(cut -f4 spianch.bed) spinach.pep.fa | seqkit seq -I >spinach.pep
# sugar
seqkit grep -f <(cut -f4 sugar.uniq.bed) BeetSet-2.genes.1408.cds.fa | seqkit seq -i >sugar.cds
seqkit grep -f <(cut -f4 sugar.uniq.bed) BeetSet-2.genes.1408.pep.fa | seqkit seq -i >sugar.pep
也可以根據(jù)gff文件晦闰,基因組ref.fa文件中直接調(diào)取cds放祟,和pep序列
## 需要安裝cufflinks
## 提取cds
gffread in.gff3 -g ref.fa -x cds.fa
## 提取pep
gffread in.gff3 -g ref.fa -y pep.fa
共線性分析
## 新建一個文件夾,方便在報錯的時候呻右,把全部都給刪了
mkdir cds && cd cds
ln -s ../sugar.cd ./
ln -s ../sugar.uniq.bed ./sugar.bed
ln -s ../spinach.cds ./
ln -s ../spinch.bed ./
## 運行代碼
python -m jcvi.compara.catalog ortholog (--dbtype prot[蛋白分析]) --no_strip_names spinach sugar
結(jié)果:
spinach.sugar.anchors:共線性block區(qū)塊(高質(zhì)量)
spinach.sugar.last:原始的last輸出
spinach.sugar.last.filtered:過濾后的last輸出
spinach.sugar.pdf:點陣圖
## 如果遇到報錯跪妥,多半是要安裝python包,更新Latex
可視化
## 首先生成.sinple文件
python -m jcvi.compara.synteny screen --minspan=30 --simple spinach.sugar.anchors spinach.sugar.anchors.new
## 編輯配置文件seqids 和layout
#設(shè)置需要展示染色體號
vi seqids
chr1,chr2,chr3,chr4,chr5,chr6 #spinach
Bvchr1.sca001,Bvchr2.sca001,Bvchr3.sca001 #sugar
# 設(shè)置顏色声滥,長寬等
vi layout
# y, xstart, xend, rotation, color, label, va, bed
.6, .1, .8, 0, red, spinach, top, spinach.bed
.4, .1, ,8, 0, blue, sugar, top, sugar.bed
# edges
e, 0, 1, spinach.sugar.anchors.simple
注意, #edges下的每一行開頭都不能有空格
## 運行代碼
python -m jcvi.graphics.karyotype seqids layout
得到以下圖片
若想突出顯示某一共線性則可以在對應(yīng)的位置添加g*即可
vi spinach.sugar.anchors.simple
g*Spo03717 Spo03716 Bv3_048620_odzi.t1 Bv3_049090_cxmm.t1 46 +
Spo17356 Spo17350 Bv1_001140_tedw.t1 Bv1_001540_xzdn.t1 41 -
Spo13685 Spo13730 Bv5_123480_yfcy.t1 Bv5_123900_rucq.t1 46 -
Spo26250 Spo26280 Bv5_117270_qhwj.t1 Bv5_117680_iykf.t1 36 +
Spo19005 Spo06982 Bv7_173830_frmo.t1 Bv7_174150_pckw.t1 37 +
Spo19374 Spo20559 Bv4_081440_riqq.t1 Bv4_081750_yeuy.t1 32 -
#運行
python -m jcvi.graphics.karyotype seqids layout
若想比較3個物種共線性關(guān)系眉撵,則應(yīng)兩兩比對,得到兩個.simple文件落塑,并對其進行配置
$ vi layout
# y, xstart, xend, rotation, color, label, va, bed
.7, .1, .8, 15, , Grape, top, grape.bed
.5, .1, .8, 0, , Peach, top, peach.bed
.3, .1, .8, -15, , Cacao, bottom, cacao.bed
# edges
e, 0, 1, grape.peach.anchors.simple
e, 1, 2, peach.cacao.anchors.simple
$ vi seqids
chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19
scaffold_1,scaffold_2,scaffold_3,scaffold_4,scaffold_5,scaffold_6,scaffold_7,scaffold_8
scaffold_1,scaffold_2,scaffold_3,scaffold_4,scaffold_5,scaffold_6,scaffold_7,scaffold_8,scaffold_9,scaffold_10r
$ python -m jcvi.graphics.karyotype seqids layout
可以得到以下結(jié)果
也可以調(diào)整配置文件纽疟,得到不同樣式的圖形
# y, xstart, xend, rotation, color, label, va, bed
.5, 0.025, 0.625, 60, , Grape, top, grape.bed
.2, 0.2, .8, 0, , Peach, top, peach.bed
.5, 0.375, 0.975, -60, , Cacao, top, cacao.bed
# edges
e, 0, 1, grape.peach.anchors.simple
e, 1, 2, peach.cacao.anchors.simple
#運行
python -m jcvi.graphics.karyotype seqids layout
得到如下結(jié)果
除此之外,可以用TBtools快速得到共線性圖片可以參考用TBtools憾赁,快速高效實現(xiàn)基因組共線性分析與可視化污朽, 贊!
微共線性分析
在基因水平上進行查看共線性結(jié)果
(1)獲取共線性區(qū)塊
$ python -m jcvi.compara.synteny mcscan grape.bed grape.peach.lifted.anchors --iter=1 -o grape.peach.i1.blocks
##
--iter=1 表示獲取最佳的區(qū)域龙考,若=2蟆肆,則獲取兩個區(qū)塊,可用于基因組復(fù)制分析
grape.peach.i1.blocks 包含兩列 (iter=1)晦款,第一列g(shù)rape基因名字炎功,第二列;peach基因名缓溅,么有對應(yīng)基因就是一個點
Ggene Pgene
Ggene Pgene
Ggene .
Ggene .
本次選擇部分進行畫圖
$ head -50 grape.peach.i1.blocks > blocks
(2)配置畫圖 blocks.layout
# x, y, rotation, ha, va, color, ratio, label
0.5, 0.6, 0, left, center, m, 1, grape Chr1
0.5, 0.4, 0, left, center, #fc8d62, 1, peach scaffold_1
# edges
e, 0, 1
(3) 進行畫圖
cat grape.bed peach.bed > grape_peach.bed
python -m jcvi.graphics.synteny blocks grape_peach.bed blocks.layout
也可以選擇直線
$ python -m jcvi.graphics.synteny blocks grape_peach.bed blocks.layout --shadestyle=line
基因也可以只用箭頭的方式
$ python -m jcvi.graphics.synteny blocks grape_peach.bed blocks.layout --glyphstyle=arrow
也可以添加顏色
$ python -m jcvi.graphics.synteny blocks grape_peach.bed blocks.layout --glyphcolor=orthogroup
添加基因名稱
$ python -m jcvi.graphics.synteny blocks grape_peach.bed blocks.layout --genelabelsize 6
也可以多個進行比對(如上述一樣)
首先倆倆進行比對蛇损,分別獲取block,然后將其合并即可
$ python -m jcvi.compara.catalog ortholog grape cacao --cscore=.99
$ python -m jcvi.compara.synteny mcscan grape.bed grape.cacao.lifted.anchors --iter=1 -o grape.cacao.i1.blocks
## 合并
$ python -m jcvi.formats.base join grape.peach.i1.blocks grape.cacao.i1.blocks --noheader | cut -f1,2,4,6 > grape.blocks
$ head -50 grape.blocks > blocks2
block 文件如下
GSVIVT01012261001 . .
GSVIVT01012259001 . .
GSVIVT01012258001 . .
GSVIVT01012257001 . .
GSVIVT01012255001 Prupe.1G290900.1 Thecc1EG011472t1
GSVIVT01012253001 Prupe.1G290800.2 Thecc1EG011473t1
GSVIVT01012252001 Prupe.1G290700.1 Thecc1EG011474t1
GSVIVT01012250001 Prupe.1G290600.1 Thecc1EG011475t1
GSVIVT01012249001 Prupe.1G290500.1 Thecc1EG011478t1
GSVIVT01012248001 Prupe.1G290400.1 Thecc1EG011482t1
blocks2.layout如下
# x, y, rotation, ha, va, color, ratio, label
0.5, 0.6, 0, center, top, , 1, grape Chr1
0.3, 0.4, 0, center, bottom, , .5, peach scaffold_1
0.7, 0.4, 0, center, bottom, , .5, cacao scaffold_2
# edges
e, 0, 1
e, 0, 2
開始畫
$ cat grape.bed peach.bed cacao.bed > grape_peach_cacao.bed
$ python -m jcvi.graphics.synteny blocks2 grape_peach_cacao.bed blocks2.layout
20240701
評估基因組大小
https://github.com/tanghaibao/jcvi/wiki/Genome-build
python -m jcvi.assembly.kmer histogram reads.histo "*S. species* ‘Variety 1’" 21
## 其中reads.histo 為kmer數(shù)量,可以用jellfish得到淤齐,或者其他工具即可
1 1281576854
2 89292133
3 21588481
4 9347716
5 5569400
6 4705214