文章由Heng Li出品框都,發(fā)布在Bioinformatics,‘compleasm: a faster and more accurate reimplementation of BUSCO’。該工具比BUSCO評(píng)估的結(jié)果更為準(zhǔn)確。
具體可以查看:https://github.com/huangnengCSU/compleasm
1. 安裝
有很多種安裝方法,可以快速使用conda進(jìn)行安裝
conda create -n <your_env_name> -c conda-forge -c bioconda compleasm
2. 基本參數(shù)
compleasm
usage: compleasm [-h] [-v] {download,list,protein,miniprot,analyze,run} ...
Compleasm
positional arguments:
{download,list,protein,miniprot,analyze,run}
Compleasm modules help
download Download specified BUSCO lineages
list List local or remote BUSCO lineages
protein Evaluate the completeness of provided protein
sequences
miniprot Run miniprot alignment
analyze Evaluate genome completeness from provided miniprot
alignment
run Run compleasm including miniprot alignment and
completeness evaluation
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
主要有上面幾個(gè)模塊馅而,其中
- run,可以輸入基因組文件譬圣,進(jìn)行基因組評(píng)估瓮恭;
- download, 下載數(shù)據(jù)庫(kù)使用;
- list厘熟, 列出當(dāng)前的數(shù)據(jù)庫(kù)屯蹦;
- protein,輸入pep的質(zhì)量绳姨;
- miniprot,進(jìn)行miniprot 比對(duì)颇玷;
- analyze, 輸入pep比對(duì)后,可以進(jìn)行分析就缆;
3. 基本操作
可以自己下載對(duì)應(yīng)的庫(kù),從這里https://busco-data.ezlab.org/v5/data/lineages/下載對(duì)應(yīng)數(shù)據(jù)庫(kù)谒亦,比如本次我下載embryophyta
mkdir database && cd database
wget https://busco-data.ezlab.org/v5/data/lineages/embryophyta_odb10.2024-01-08.tar.gz
tar -zxf *.gz
對(duì)基因組數(shù)據(jù)進(jìn)行評(píng)估:
ref=test.fa
compleasm run -t 10 -l embryophyta \
-L */database -a $ref -o ${ref}.out
# 基本參數(shù):
- t, 線程數(shù)
- a竭宰,ref文件
- o空郊, 輸出;
- l, 庫(kù)名稱(chēng)切揭;
- L狞甚,本地庫(kù)路徑;
--specified_contigs , 指定congtigs進(jìn)行評(píng)估
同時(shí)可以對(duì)protein進(jìn)行評(píng)估
compleasm protein -p pep.fa -o pep.fa.out -l embryophyta -L */database -t 10
所有結(jié)果均位于輸入目錄下 summary.txt
## lineage: embryophyta_odb10
S:97.40%, 1572
D:1.80%, 29
F:0.12%, 2
I:0.00%, 0
M:0.68%, 11
N:1614
確實(shí)比利用BUSCO的值要高一些廓旬,具體的可以查看文章哼审。