一氛雪、NCBI blast+
1. 安裝配置BLAST+程序
在ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/中下載最新的BLAST可執(zhí)行程序(不要下載源代碼`敦冬,源碼編譯非常慢)偶宫,選擇預(yù)編譯版本矿咕,如ncbi-blast-2.2.30+-x64-linux.tar.gz。如果服務(wù)器能聯(lián)網(wǎng)春贸,可直接用wget下載居触。或者痹屹,下載后用SFTP客戶端傳輸?shù)椒?wù)器上章郁。
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.30+-x64-linux.tar.gz
解壓縮:
tar -zxvf ncbi-blast-2.2.30+-x64-linux.tar.gz
2.基本用法
**提示:blast輸出格式有多種,其中11包含信息最全,其它格式都可用blast_formatter程序由11轉(zhuǎn)化為其它格式暖庄。所以聊替,比對(duì)結(jié)果請(qǐng)使用11格式。
1) 對(duì)相應(yīng)的序列進(jìn)行建庫(kù)
makeblastdb -in db.fasta -dbtype nucl -parse_seqids -out dbname
**其中 -dbtype 為 nucl 則表示對(duì)核酸類型的序列建庫(kù)培廓,為 prot 則表示對(duì)氨基酸類型的序列進(jìn)行建庫(kù)
2) 建庫(kù)之后惹悄,就是拿目標(biāo)序列比對(duì)
blastn -query test.fa -db daname -outfmt 11 -out "test.blastn@nr.asn"?-num_threads 8
**其中輸出文件名test.blastn@nr.asn是個(gè)人習(xí)慣,即“序列文件名.blast子程序名@庫(kù)名.結(jié)果格式”肩钠,結(jié)果簡(jiǎn)單明了
**如果目標(biāo)序列是蛋白序列泣港,匹配到 nr 數(shù)據(jù)庫(kù)或者其他蛋白類數(shù)據(jù)庫(kù),以及其他自己構(gòu)建的蛋白序列庫(kù)時(shí)价匠,則用 blastp, 其他參數(shù)類似当纱。
二、diamond程序
1. 安裝diamond程序
在diamond下載界面獲得下載鏈接
wget http://github.com/bbuchfink/diamond/releases/download/v0.9.17/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz
**解壓結(jié)果為一個(gè)二進(jìn)制可執(zhí)行文件 diamond, 直接添加環(huán)境變量即可
2. 基本用法
To now run an alignment task, we assume to have a protein database file in FASTA format named?nr.faa?and a file of DNA reads that we want to align namedreads.fna.
1) 建庫(kù) In order to set up a reference database for DIAMOND, the?makedb?command needs to be executed with the following command line:
$ diamond makedb --in nr.faa -d nr ## 建庫(kù)
$ diamond help
diamond helpdiamond v0.8.8.70 | by Benjamin BuchfinkCheck http://github.com/bbuchfink/diamond for updates.
Syntax: diamond COMMAND [OPTIONS]
Commands:
makedb Build DIAMOND database from a FASTA file
blastp Align amino acid query sequences against a protein reference database
blastx Align DNA query sequences against a protein reference database
view View DIAMOND alignment archive (DAA) formatted file
help Produce help message
version Display version information
General options:
--threads (-p)? ? ? ? number of CPU threads
--db (-d)? ? ? ? ? ? ? database file
--daa (-a)? ? ? ? ? ? DIAMOND alignment archive (DAA) file
--verbose (-v)? ? ? ? verbose console output
--log? ? ? ? ? ? ? ? ? enable debug log
--quiet? ? ? ? ? ? ? ? disable console output
Makedb options:
--in? ? ? ? ? ? ? ? ? input reference file in FASTA format
--block-size (-b)? ? ? sequence block size in billions of letters (default=2)
Aligner options:
--query (-q)? ? ? ? ? input query file
--max-target-seqs (-k) maximum number of target sequences to report alignments for
--top? ? ? ? ? ? ? ? ? report alignments within this percentage range of top alignment score (overrides --max-target-seqs)
--compress? ? ? ? ? ? compression for output files (0=none, 1=gzip)
--evalue (-e)? ? ? ? ? maximum e-value to report alignments
--min-score? ? ? ? ? ? minimum bit score to report alignments (overrides e-value setting)
--id? ? ? ? ? ? ? ? ? minimum identity% to report an alignment
--query-cover? ? ? ? ? minimum query cover% to report an alignment
--sensitive? ? ? ? ? ? enable sensitive mode (default: fast)
--index-chunks (-c)? ? number of chunks for index processing
--tmpdir (-t)? ? ? ? ? directory for temporary files
--gapopen? ? ? ? ? ? ? gap open penalty (default=11 for protein)
--gapextend? ? ? ? ? ? gap extension penalty (default=1 for protein)
--matrix? ? ? ? ? ? ? score matrix for protein alignment
--seg? ? ? ? ? ? ? ? ? enable SEG masking of queries (yes/no)
--salltitles? ? ? ? ? print full subject titles in output files
Advanced options:
--seed-freq? ? ? ? ? ? maximum seed frequency
--run-len (-l)? ? ? ? mask runs between stop codons shorter than this length
--max-hits (-C)? ? ? ? maximum number of hits to consider for one seed
--id2? ? ? ? ? ? ? ? ? minimum number of identities for stage 1 hit
--window (-w)? ? ? ? ? window size for local hit search
--xdrop (-x)? ? ? ? ? xdrop for ungapped alignment
--gapped-xdrop (-X)? ? xdrop for gapped alignment in bits
--ungapped-score? ? ? minimum raw alignment score to continue local extension
--hit-band? ? ? ? ? ? band for hit verification
--hit-score? ? ? ? ? ? minimum score to keep a tentative alignment
--band? ? ? ? ? ? ? ? band for dynamic programming computation
--shapes (-s)? ? ? ? ? number of seed shapes (0 = all available)
--index-mode? ? ? ? ? index mode (0=4x12, 1=16x9)
--fetch-size? ? ? ? ? trace point fetch size
--single-domain? ? ? ? Discard secondary domains within one target sequence
--dbsize? ? ? ? ? ? ? effective database size (in letters)
--no-auto-append? ? ? disable auto appending of DAA and DMND file extensions
View options:
--out (-o)? ? ? ? ? ? output file
--outfmt (-f)? ? ? ? ? output format (tab/sam/xml)
--forwardonly? ? ? ? ? only show alignments of forward strand
2) 序列比對(duì)
** 上面建庫(kù)之后會(huì)生成一個(gè) nr.dmnd 文件踩窖,The alignment task may then be initiated using the?blastx?command like this:
$ diamond blastx -d nr -q reads.fna -o matches.m8
The output file here is specified with the?–o?option and named?matches.m8. By default, it is generated in BLAST tabular format.