Li, Heng. "Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM."?arXiv preprint arXiv:1303.3997?(2013).
ABSTRACT
Summary:?BWA-MEM是一種新的比對算法,用于針對大型參考基因組(例如人)比對 sequence reads 或 assembly contigs。它會自動在 local 和 end-to-end alignments 之間進(jìn)行選擇务荆,支持配對末端讀取并執(zhí)行嵌合比對( supports paired-end reads and performs chimeric alignment )汹粤。該算法對測序錯誤具有魯棒性,適用于從70bp到幾兆堿基的廣泛序列長度。對于100bp序列的定位,BWA-MEM的性能優(yōu)于迄今為止的幾種最先進(jìn)的閱讀比對儀。
Availability and implementation:??BWA-MEM is implemented as a?component of BWA, which is available at http://github.com/lh3/bwa.
INTRODUCTION
①?當(dāng)序列讀取的長度約為36bp時酬核,大多數(shù)用于下一代測序(NGS)數(shù)據(jù)的短讀取映射器都已開發(fā)出來
②?對于36bp的讀取蜜另,合理的是要求端對端對齊(即每個讀取堿基均應(yīng)與參考序列對齊),并且僅報告在一定漢明或編輯距離內(nèi)的命中值
③?對于100bp或更長時間的讀取嫡意,在仿射間隔罰分下(?the affifine-gap penalty )允許存在較長的缺口( gaps )举瑰,并報告可能由參考基因組中的結(jié)構(gòu)變異或組裝錯誤(?misassemblies )引起的多個不重疊的局部命中點(diǎn),變得更為重要;?許多短讀比對算法不適用于或不適合映射較長的讀段;?同時蔬螟,盡管存在用于對齊毛細(xì)管序列讀數(shù) ( capillary sequence reads ) 的幾種成熟算法此迅,但它們速度慢且缺乏分析大規(guī)模NGS數(shù)據(jù)的功能; 快速發(fā)展的NGS技術(shù)一直迫切要求開發(fā)新的比對算法。
④ 比對算法舉例:
BEA-SW( 2010 ); Bowtie2( 2012); Cushaw2( 2012 ); GEM( 2012 )
列出上述算法不足旧巾,引出下文算法( All these concerns motivated us to explore a new alignment algorithm. )
METHODS
一耸序、 Aligning a single query sequence
①?Seeding and re-seeding
② Chaining and chain fifiltering
③?Seed extension
二、Paired-end mapping
①? Rescuing missing hits
②?Pairing
RESULTS AND DISCUSSIONS
BWA-MEM是一種快速鲁猩,準(zhǔn)確的序列讀取比對儀坎怪,是為70bp讀取和長至幾兆堿基的長序列都能很好地工作的少數(shù)幾個。從技術(shù)上講廓握,通過使用基于SSE2的帶區(qū)DP并將DP限制在長時間精確匹配未覆蓋的區(qū)域搅窿,可以使BWA-MEM在較長序列上更快。Seeding is the bottleneck for short sequences, while banded DP is the bottleneck for long sequences.