參考文章:
https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)
http://www.reibang.com/p/466274ad932b
網(wǎng)頁(yè)上有很多的相關(guān)文章怎憋,但是當(dāng)我自己在用網(wǎng)上的方法進(jìn)行分析的時(shí)候备籽,總是會(huì)報(bào)出各種各樣的錯(cuò)誤暑始,本著愚公精神菩掏,最后總與做出了想要的圖片郁稍,記錄一下。
一 分析所需的軟件和數(shù)據(jù)
1. 軟件
LASTAL http://last.cbrc.jp/
## 安裝LASTAL,可以直接下載到windows中再傳到服務(wù)器选泻。或者直接右鍵,復(fù)制下載鏈接直接在服務(wù)器中下載页眯。
wget http://last.cbrc.jp/last-1111.zip && unzip ./last-1111.zip && cd ./last-1111 && make
## move lastal and lastdb on your PATH
cd src && echo "export PATH=`pwd`:\$PATH" >> ~/.bashrc && source ~/.bashrc
## 安裝jcvi
pip install jcvi
2.數(shù)據(jù)
github網(wǎng)頁(yè)中說(shuō)的是可以直接通過(guò)登錄phytozome,但是我登陸的時(shí)候梯捕,總是會(huì)出現(xiàn)如下的提示,在嘗試了很多次無(wú)果窝撵,決定直接下載傀顾。從phytozome上下載野生大豆和栽培大豆的.cds.fa和.gff3文件發(fā)現(xiàn)給出的鏈接是NCBI。沒(méi)辦法了只能從NCBI和Soybase官網(wǎng)上下載相關(guān)的文件碌奉,其中野生大豆w05的文件從soybase中下載短曾。栽培大豆威廉82的相關(guān)文件是從NCBI中下載。
w05下載地址:https://soybase.org/data/public/Glycine_soja/W05.gnm1.ann1.T47J/
w82下載地址:
https://ftp.ncbi.nlm.nih.gov/genomes/genbank/plant/Glycine_max/latest_assembly_versions/GCA_000004515.4_Glycine_max_v2.1/
二 數(shù)據(jù)處理
1.GFF轉(zhuǎn)換成bed文件
## w05和w82
python -m jcvi.formats.gff bed --type=mRNA --key=ID glyso.W05.gnm1.ann1.T47J.gene_models_main.gff3 -o w05.bed
python -m jcvi.formats.gff bed --type=mRNA --key=ID GCA_000004515.4_Glycine_max_v2.1_genomic.gff -o w82.bed
2.根據(jù)bed文件獲取對(duì)應(yīng)的cds序列
## w05和w82
seqkit grep -f <(cut -f4 w05.bed) w05.cds.fa | seqkit seq -i >w05.cds
seqkit grep -f <(cut -f4 w82.bed) w82.cds.fa | seqkit seq -i >w82.cds
seqkit下載網(wǎng)址(linux 64-bit):https://github.com/shenwei356/seqkit/releases/download/v0.13.2/seqkit_linux_amd64.tar.gz
上面的代碼運(yùn)行后發(fā)現(xiàn)道批,w05沒(méi)有問(wèn)題错英,可以提取相應(yīng)的CDS序列。但是隆豹,w82的CDS 文件則為0椭岩,檢查后發(fā)現(xiàn)*.fa文件和bed文件中的第4列的命名完全不相同。
只能人為改變文件的格式璃赡,使得后面的程序可以正常運(yùn)行判哥。
#!/usr/bin/python
import sys
#def bed():
# for line in open(sys.argv[1],"r"):
# line = line.strip().split('\t')
# gene = line[3].split("|")[2]
# print("%s\t%s\t%s\t%s\t%s\t%s"%(line[0],line[1],line[2],gene,line[4],line[5]))
def fa():
for line in open(sys.argv[1],"r"):
line = line.strip()
if line.startswith(">"):
geneID = line.strip().split(" ")[2].split(":")[1].split(".")
new_genID = ">" + geneID[0] + '.' + geneID[1] + "." + geneID[2]
print(new_genID)
continue
else:
print(line)
if __name__ == "__main__":
# bed()
fa()
## 改bed文件的格式,使用bed函數(shù)碉考。改cds.fa文件塌计,使用fa函數(shù)。
## 用法:python xxx.py xxx.cds.fa > new_xxx.cds.fa
修改后的w82的bed文件和cds.fa文件侯谁,運(yùn)行前面的代碼便可以成功的pull出w82.cds
3.共線性分析
## 將生成的w82.bed;w82.cds和w05.bed;w05.cds放到一個(gè)目錄下
ls ./cds
w05.bed w05.cds w82.bed w82.cds
## Pairwise synteny search
python -m jcvi.compara.catalog ortholog w82 w05 --no_strip_names
查看目錄里生成的文件
5.可視化
繪制karyotype figure需要準(zhǔn)備兩個(gè)文件锌仅;
First is the seqids file, which tells the plotter which set of chromosomes to include. Here, we've removed unplaced and small scaffolds. The first line contains 19 grape chromosomes and second line contains 8 peach chromosomes.
CM000834.3,CM000835.3,CM000836.3,CM000837.4,CM000838.2,CM000839.3,CM000840.3,CM000841.3,CM000842.3,CM000843.3,CM000844.3,CM000845.2,CM000846.2,CM000847.2,CM000848.2,CM000849.2,CM000850.3,CM000851.3,CM000852.3,CM000853.3
glyso.W05.gnm1.Chr01,glyso.W05.gnm1.Chr02,glyso.W05.gnm1.Chr03,glyso.W05.gnm1.Chr04,glyso.W05.gnm1.Chr05,glyso.W05.gnm1.Chr06,W05.gnm1.Chr05,glyso.W05.gnm1.Chr07,W05.gnm1.Chr05,glyso.W05.gnm1.Chr08,W05.gnm1.Chr05,glyso.W05.gnm1.Chr09,W05.gnm1.Chr05,glyso.W05.gnm1.Chr10,W05.gnm1.Chr05,glyso.W05.gnm1.Chr11,W05.gnm1.Chr05,glyso.W05.gnm1.Chr12,W05.gnm1.Chr05,glyso.W05.gnm1.Chr13,W05.gnm1.Chr05,glyso.W05.gnm1.Chr14,W05.gnm1.Chr05,glyso.W05.gnm1.Chr15,W05.gnm1.Chr05,glyso.W05.gnm1.Chr16,glyso.W05.gnm1.Chr17,W05.gnm1.Chr05,glyso.W05.gnm1.Chr18,W05.gnm1.Chr05,glyso.W05.gnm1.Chr19,W05.gnm1.Chr05,glyso.W05.gnm1.Chr20
Second is the layout file, which tells the plotter where to draw what. The whole canvas is 0-1 on x-axis and 0-1 on y-axis. First, three columns specify the position of the track. Then rotation, color, label, vertical alignment (va), and then the genome BED file. Track 0 is now grape, track 1 is now peach. The next stanza specifies what edges to draw between the tracks. e, 0, 1 asks to draw edges between track 0 and 1, using information from the .simple file.
# 先生成.sample文件
python -m jcvi.compara.synteny screen --minspan=30 --simple w82.w05.anchors w82.w05.anchors.new
# y, xstart, xend, rotation, color, label, va, bed
.6, .1, .8, 0, , w82, top, w82.bed
.4, .1, .8, 0, , w05, top, w05.bed
# edges
e, 0, 1, w82.w05.anchors.simple
# 繪圖
python -m jcvi.graphics.karyotype seqids layout --shadestyle=line
圖中的染色體體序號(hào)還有問(wèn)題。還沒(méi)有找到好的解決方法墙贱。