Step 1,?使用序列比對(duì)工具進(jìn)行序列比對(duì)鹉梨,這里筆者用的mafft(官網(wǎng)說明:https://mafft.cbrc.jp/alignment/software/)
官方:mafft--autoinput>output筆者操作:mafft--autoAth.direct+inverted_ID.cds.229.fasta>Ath.direct+inverted_ID.cds.229.aln.fasta
以下為mafft命令終端輸出結(jié)果:nthread=0nthreadpair=0nthreadtb=0ppenalty_ex=0stacksize:8192 kbgeneratinga scoring matrix for nucleotide (dist=200) ... doneGapPenalty = -1.53, +0.00, +0.00Makinga distance matrix ..Thereare 1 ambiguous characters.201/ 229done.Constructinga UPGMA tree (efffree=0) ... 220/ 229done.Progressivealignment 1/2... STEP129 / 228? fReallocating..done.*alloclen = 23649STEP176 / 228? fReallocating..done.*alloclen = 26535STEP226 / 228? fReallocating..done.*alloclen = 27829STEP228 / 228? fdone.Makinga distance matrix from msa.. 200/ 229done.Constructinga UPGMA tree (efffree=1) ... 220/ 229done.Progressivealignment 2/2... STEP209 / 228? fReallocating..done.*alloclen = 27476STEP228 / 228? fdone.disttbfast(nuc) Version 7.471alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.00thread(s)Strategy:FFT-NS-2(Fast but rough)Progressivemethod (guide trees were built 2 times.)Ifunsure which option to use, try 'mafft --auto input > output'.Formore information, see 'mafft --help', 'mafft --man' and the mafft page.Thedefault gap scoring scheme has been changed in version 7.110 (2013 Oct).Ittends to insert more gaps into gap-rich regions than previous versions.To?disable?this?change,?add?the?--leavegappyregion?option.
以上的輸出信息要記住mafft的版本為7.471端衰,對(duì)于比對(duì)策略悬包,程序選擇了FFT-NS-2畜吊。(目的是方便最后寫文章時(shí)對(duì)材料方法的描述)驴党。
# 使用--auto為程序自動(dòng)選擇比對(duì)策略跪楞,默認(rèn)比對(duì)結(jié)果格式為fasta格式茵烈。
如果輸出clustal格式即.aln的比對(duì)文件百匆,用下面的命令
mafft --clustalout input.fasta > input.out
Step 2,?接下來基于序列比對(duì)文件使用FastTree構(gòu)建ML系統(tǒng)發(fā)育樹。(FastTree官網(wǎng):http://www.microbesonline.org/fasttree/#Install)
下載即安裝
運(yùn)行
FastTree-gtr-nt-gammaalignment_file>tree_file筆者操作:FastTree-gtr-nt-gammaAth.direct+inverted_ID.cds.229.aln.fasta>Ath.direct+inverted_ID.cds.229.aln.fasta.tree.nwk
以下為FastTree命令終端輸出結(jié)果:FastTreeVersion 2.1.11 SSE3 ###Alignment:Ath.direct+inverted_ID.cds.229.aln.fastaNucleotidedistances: Jukes-Cantor Joins: balanced Support: SH-like 1000Search:Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1TopHits:1.00*sqrtN close=default refresh=0.80MLModel: Generalized Time-Reversible, CAT approximation with 20 rate categories ###Ignoredunknown character n (seen 1 times)Initialtopology in 2.07 seconds0 of? ? 224? 227 seqs (at seed? ? 200)? Refiningtopology: 31 rounds ME-NNIs, 2 rounds ME-SPRs, 16 rounds ML-NNIsTotalbranch-length 98.807 after 18.26 sec 1 of 225 splits? ? 0 changes? x delta 0.161)? ? ML-NNIround 1: LogLk = -578498.850 NNIs 45 max delta 21.75 Time 32.02s (max delta 21.753)? GTRFrequencies: 0.3022 0.2199 0.2241 0.2538ep 12 of 12? GTRrates(ac ag at cg ct gt) 1.0483 2.5389 1.0248 0.9926 2.7404 1.0000Switchedto using 20 rate categories (CAT approximation)19 of 20? Ratecategories were divided by 0.800 so that average rate = 1.0CAT-basedlog-likelihoods may not be comparable across runsML-NNIround 2: LogLk = -558919.789 NNIs 17 max delta 7.53 Time 58.68es (max delta 7.527)? ML-NNIround 3: LogLk = -558887.713 NNIs 8 max delta 0.77 Time 64.15es (max delta 0.334)? ML-NNIround 4: LogLk = -558870.798 NNIs 1 max delta 0.11 Time 67.04ML-NNIround 5: LogLk = -558870.004 NNIs 1 max delta 0.51 Time 68.13ML-NNIround 6: LogLk = -558869.763 NNIs 0 max delta 0.00 Time 68.71Turningoff heuristics for final round of ML NNIs (converged)ML-NNIround 7: LogLk = -558646.178 NNIs 0 max delta 0.00 Time 81.92 (final)Optimizeall lengths: LogLk = -558636.448 Time 85.18Gamma(20)LogLk = -566145.706 alpha = 9.988 rescaling lengths by 1.223s? Total?time:?103.97?seconds?Unique:?227/229?Bad?splits:?0/224
同樣記住軟件版本和模型即可呜投〖有伲看末尾###標(biāo)注行。
Step3,最后打開結(jié)果樹文件仑荐,Ath.direct+inverted_ID.cds.229.aln.fasta.tree.nwk雕拼,并進(jìn)行所需的修飾即可。粘招。啥寇。如下,