1.簡介
(1)MCScanX-transposed是基于在相關(guān)基因組內(nèi)和相互之間應(yīng)用MCScanX曹宴,能夠檢測不同時(shí)期內(nèi)發(fā)生的轉(zhuǎn)座基因復(fù)制的軟件包镣典,也有助于基因復(fù)制模式的綜合分析和用基因復(fù)制模式注釋感興趣的基因家族鸳玩。
MCScanX是用于檢測和進(jìn)化分析基因同源性和共線性的工具包俏脊,而MCScanX-transposed是用于檢測不同時(shí)期內(nèi)發(fā)生的轉(zhuǎn)座基因復(fù)制以及基因復(fù)制模式的綜合分析的軟件包艘包,參看MCScanX-tansposed's manual统倒。
(2)發(fā)表文章:Wang Y, Li J, Paterson AH. (2013) MCScanX-transposed: detecting transposed gene duplications based on multiple colinearity scans. Bioinformatics, doi: 10.1093/bioinformatics/btt150.
2.安裝
wget http://chibba.pgml.uga.edu/mcscan2/transposed/MCScanX-transposed.zip
unzip MCScanX-transposed.zip
cd MCScanX-transposed
make
解壓以后包括以下程序:
3.利用測試文件寨典,了解方法
注意:
? 解壓安裝以后有個(gè)data文件夾,里面有At測試數(shù)據(jù)房匆,
? -i 后面必須要接文件夾名不能用 ./耸成,指定輸出文件夾名./data
? 準(zhǔn)備的數(shù)據(jù)必須在MCScanX-transpose文件夾下,否則報(bào)錯(cuò)
perl ~/biosoft/MCScanX-transposed/MCScanX-transposed.pl -i ./data -t at -c al,br,cp,pt,vv -o result/at_result
結(jié)果如圖:
生成15個(gè)結(jié)果文件浴鸿,主要有8個(gè):
4.核心程序 MCScanX-transposed.pl
使用前需要準(zhǔn)備文件:
注意:
1.由于不方便演示自己的準(zhǔn)備的文件井氢,還是以官網(wǎng)測試數(shù)據(jù)為例,若自己要準(zhǔn)備文件岳链,即替換擬南芥為自己研究的物種花竞,其他的物種可以選擇自己關(guān)心的物種。
2.不用測試文件掸哑,用自己的文件容易被坑约急,因?yàn)椴恢澜Y(jié)果是什么(我就是被坑慘啦)零远。
(1)準(zhǔn)備文件:
重要:使用者必須通過仔細(xì)閱讀下列說明(1-4)準(zhǔn)備輸入文件。
- All input files should be stored under ONE folder(the "data_directory" parameter)
- For the target genome in which gene duplicaiton modes will be classified, please prepare two input files:
a) "[target_species].gff", a gene position file for the target species, following a tab-delimited format: "sp&chr_NO gene starting_position ending_position"
b) "[target_species].blast", a blastp output file (m8 format) for the target species (self-genome comparison). - For each outgroup genome, please prepare two input files:
a) "[target_species][outgroup_species].gff", a gene position file for the target_species and outgroup_species, following a tab-delimited format:"sp&chr_NO gene starting_position ending_position"
b) "[target_species][outgroup_species].blast", a blastp output file (m8 format) between the target and outgroup species (cross-genome comparison). - For example, assuming that you are going to classify gene duplication modes in Arabidopsis thaliana (ID: at), using Brassica rapa (ID: br) and Carica papaya (ID: cp) as outgroups, you need to prepare 6 input files: "at.gff","at.blast", "at_br.gff", "at_br.blast","at_br.gff","at_cp.gff" and "at_cp.blast".
(2)建庫
以at_vv.gff文件為例厌蔽,其他準(zhǔn)備相同:
cat at.gff vv.gff >at_vv.gff
makeblastdb -in at_vv.pep -dbtype prot -parse_seqids -out at_vv.db
blastp -query at_vv.pep -db at_vv.db -out at_vv.blast -evalue 1e-10 -num_threads 20 -outfmt 6 -num_alignments 5
at_vv.blast文件:
1.官網(wǎng)at_vv.blast 包括2種結(jié)果:at-vv牵辣、vv-at(我自己分析at-at、vv-vv結(jié)果不去掉奴饮,好像就是程序運(yùn)行慢纬向,讀取過程也會(huì)自動(dòng)去掉)。
2.多個(gè)轉(zhuǎn)錄本存在時(shí)戴卜,選擇最長轉(zhuǎn)錄本逾条。使用命令行/腳本或者軟件TBtools的Fasta Longest Representive功能。
3.多個(gè)物種建庫blastp命令進(jìn)行封裝叉瘩。
(3)分類提取結(jié)果
Classify gene duplication modes in A. thaliana, using A. lyrata, Brassica rapa, Carica papaya, Populus trichocarpa and Vitis vinifera as outgroups and specifying three epochs to be identified, by the command:
1)同上第三點(diǎn)3的命令和結(jié)果:
perl MCScanX-transposed.pl -i data -t at -c al,br,cp,pt,vv -o result/at_result
2)加上-x 3的結(jié)果膳帕,自己與上面比對(duì):
perl MCScanX-transposed.pl -i data -t at -c al,br,cp,pt,vv -o result/at_result -x 3
5.下游分析程序(僅介紹前三種)
Tool 1. add_ka_ks.pl
Tool 2. detect_dup_modes_for_a_gene.pl
Tool 3. detect_dup_modes_for_a_family.pl
Tool 4. annotate_tree_with_dup_mode
Tool 5. annotate_tree_with_tra_dup
(1)add_ka_ks.pl(需要Bioperl)
perl add_ka_ks.pl -d data/at.cds -i result/at_result/at.transposed_after_al.pairs -o result/at.transposed_after_al.pairs.kaks
(2)detect_dup_modes_for_a_family.pl
mads.genes文件: gene ID以tab鍵分隔
perl detect_dup_modes_for_a_family.pl -i data/mads.genes -d result/at_result/at -o result/mads.duplication.modes
注意:
結(jié)果有包含轉(zhuǎn)座基因~
(3)detect_dup_modes_for_a_gene.pl
perl detect_dup_modes_for_a_family.pl -i data/mads.genes -d result/test1/at -o result/mads.dup