前言
基因組結(jié)構(gòu)變異是很多癌癥、遺傳病等疾病的重要誘因。目前基于二代測序技術檢測基因組結(jié)構(gòu)變異存在很大的局限性,而三代測序存在錯誤率較高等多種問題塌衰,尤其針對復雜結(jié)構(gòu)變異大多軟件識別能力較差。針對這一問題蝠嘉,有研究人員就開發(fā)出了基因組比對工具NGMLR和結(jié)構(gòu)變異識別工具Sniffles最疆,為變異檢測提供了前所未有的靈敏度和精確度,并且NGMLR和Sniffles可以自動過濾虛假事件并對低覆蓋率數(shù)據(jù)進行操作蚤告,從而降低成本努酸。
簡介
NGMLR和Sniffles是適用于長讀長測序的新型結(jié)構(gòu)變異檢測工具,基因組比對工具NGMLR在基于短read比對方法的基礎上杜恰,考慮了PacBio和Oxford Nanopore平臺產(chǎn)生的數(shù)據(jù)類型掏击。結(jié)構(gòu)變異識別工具Sniffles是一款結(jié)構(gòu)變異識別工具管嬉,可以根據(jù)比對結(jié)果進行掃描哗总,精確檢測出結(jié)構(gòu)變異誊役。
NGMLR(左)和Sniffles(右)的主要步驟
NGMLR
安裝
推薦使用conda進行安裝:
conda install ngmlr
使用
對于Pacbio數(shù)據(jù):
ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam
對于Oxford Nanopore數(shù)據(jù):
ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam -x ont
參數(shù)說明
用法:ngmlr [options] -r <reference> -q <reads> [-o <output>]
輸入/輸出參數(shù):
-r <file>, --reference <file>
(required) Path to the reference genome (FASTA/Q, can be gzipped)
-q <file>, --query <file>
Path to the read file (FASTA/Q) [/dev/stdin]
-o <string>, --output <string>
Path to output file [stdout]
--skip-write
Don't write reference index to disk [false]
--bam-fix
Report reads with > 64k CIGAR operations as unmapped. Required to be compatible with the BAM format [false]
--rg-id <string>
Adds RG:Z:<string> to all alignments in SAM/BAM [none]
--rg-sm <string>
RG header: Sample [none]
--rg-lb <string>
RG header: Library [none]
--rg-pl <string>
RG header: Platform [none]
--rg-ds <string>
RG header: Description [none]
--rg-dt <string>
RG header: Date (format: YYYY-MM-DD) [none]
--rg-pu <string>
RG header: Platform unit [none]
--rg-pi <string>
RG header: Median insert size [none]
--rg-pg <string>
RG header: Programs [none]
--rg-cn <string>
RG header: sequencing center [none]
--rg-fo <string>
RG header: Flow order [none]
--rg-ks <string>
RG header: Key sequence [none]
一般參數(shù):
-t <int>, --threads <int>
Number of threads [1]
-x <pacbio, ont>, --presets <pacbio, ont>
Parameter presets for different sequencing technologies [pacbio]
-i <0-1>, --min-identity <0-1>
Alignments with an identity lower than this threshold will be discarded [0.65]
-R <int/float>, --min-residues <int/float>
Alignments containing less than <int> or (<float> * read length) residues will be discarded [0.25]
--no-smallinv
Don't detect small inversions [false]
--no-lowqualitysplit
Split alignments with poor quality [false]
--verbose
Debug output [false]
--no-progress
Don't print progress info while mapping [false]
高級參數(shù):
--match <float>
Match score [2]
--mismatch <float>
Mismatch score [-5]
--gap-open <float>
Gap open score [-5]
--gap-extend-max <float>
Gap open extend max [-5]
--gap-extend-min <float>
Gap open extend min [-1]
--gap-decay <float>
Gap extend decay [0.15]
-k <10-15>, --kmer-length <10-15>
K-mer length in bases [13]
--kmer-skip <int>
Number of k-mers to skip when building the lookup table from the reference [2]
--bin-size <int>
Sets the size of the grid used during candidate search [4]
--max-segments <int>
Max number of segments allowed for a read per kb [1]
--subread-length <int>
Length of fragments reads are split into [256]
--subread-corridor <int>
Length of corridor sub-reads are aligned with [40]
Sniffles
安裝
推薦使用conda進行安裝:
conda install sniffles
使用
sniffles -m mapped.sort.bam -v output.vcf
mapped.sort.bam可以來自ngmlr或bwa,如果是來自bwa檬寂,要使用-M參數(shù)標記出主要和次要比對终抽。