在上一篇我們介紹了一種去冗余的方法http://www.reibang.com/p/f638ce6b7c8f
,還有其他的基因組去冗余方法,可以多種工具共同使用盗飒,以便達(dá)到最優(yōu)的結(jié)果前方。
這里推薦另外一種給基因組去冗余的方法俩功,使用canu官網(wǎng)推薦的purge_dups進(jìn)行去冗余娘赴。
軟件介紹
purge haplotigs and overlaps in an assembly based on read depth
Dependencies
1.zlib
2.minimap2
3.runner (optional)
4.python3 (optional)
軟件的安裝
1.安裝purge_dugs
git clone https://github.com/dfguan/purge_dups.git
cd purge_dups/src && make
2.安裝runner
git clone https://github.com/dfguan/runner.git
cd runner && python3 setup.py install --user
軟件使用
#! /bin/bash
mkdir Purge_Dups
cd Purge_Dups
##1.數(shù)據(jù)準(zhǔn)備
contig=sc.asm.hic.p_ctg.fa
ln -s ../sc.asm.hic.p_ctg.fa pri_asm.fa
pri_asm=pri_asm.fa
minimap2 -xasm20 -t 10 $pri_asm $hifi| gzip -c - > hifi.paf.gz
~/biosoft/purge_dups-1.2.5/bin/pbcstat hifi.paf.gz #(produces PB.base.cov and PB.stat files)
~/biosoft/purge_dups-1.2.5/bin/calcuts PB.stat > cutoffs 2>calcults.log
~/biosoft/purge_dups-1.2.5/bin/split_fa $pri_asm > $pri_asm.split
minimap2 -x asm5 -DP $pri_asm.split $pri_asm.split | gzip -c - > $pri_asm.split.self.paf.gz
~/biosoft/purge_dups-1.2.5/bin/purge_dups -2 -T cutoffs -c PB.base.cov $pri_asm.split.self.paf.gz > dups.bed 2> purge_dups.log
~/biosoft/purge_dups-1.2.5/bin/get_seqs -e dups.bed $pri_asm
結(jié)果文件
compress file:Purge_Dups/purged.fa