BOWTIE2 進行基因組比對

整理ChIP-seq / CUT & Tag 分析時用到的工具楞艾。本文只對使用的工具用法進行簡單介紹参咙。

Bowtie 2是常用的基因組比對軟件。其原理在此不過多贅述硫眯,有興趣的同學(xué)可以參閱其官方文檔以及其發(fā)表的文章(https://doi.org/10.1038/nmeth.1923)蕴侧。下面簡單介紹Bowtie 2 Index和比對的命令及個人常用參數(shù)。

用法

Index

bowtie2-build [options]* <reference_in> <bt2_base>

<reference_in>:如果此處使用-f 參數(shù)两入,則指明index的參考fasta 文件净宵;如果使用-c參數(shù),則指明index的參考序列,例如择葡,GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA.
<bt2_base>:指的是生成的index文件的前綴紧武,默認情況,bowtie2-build產(chǎn)生NAME.1.bt2, NAME.2.bt2, NAME.3.bt2, NAME.4.bt2, NAME.rev.1.bt2, and NAME.rev.2.bt2, where NAME is <bt2_base>.
--threads 使用的線程數(shù)

例子

bowtie2-build -f /public/Reference/GRCh38.primary_assembly.genome.fa --threads 24 GRCh38

上述命令使用該fasta文件/public/Reference/GRCh38.primary_assembly.genome.fa 敏储,在當(dāng)前位置產(chǎn)生前綴為GRCh38的index文件阻星。

Alignment

單端測序比對

bowtie2 [options]* -x <bt2-idx> -U <fq> -S <sam_output> -p <threads> 2>Align.summary

-x:參考基因組index文件的前綴(包括路徑)
-U:單端測序的fastq文件
-S:輸出的SAM文件,包含比對結(jié)果
-p:使用的線程數(shù)
"2>Align.summary":將輸出到屏幕的標(biāo)準(zhǔn)誤(standard error)重導(dǎo)向到"Align.summary"文件已添,其格式通常如下

## Single-end
20000 reads; of these:
  20000 (100.00%) were unpaired; of these:
    1247 (6.24%) aligned 0 times
    18739 (93.69%) aligned exactly 1 time
    14 (0.07%) aligned >1 times
93.77% overall alignment rate

## Paired-end
10000 reads; of these:
  10000 (100.00%) were paired; of these:
    650 (6.50%) aligned concordantly 0 times
    8823 (88.23%) aligned concordantly exactly 1 time
    527 (5.27%) aligned concordantly >1 times
    ----
    650 pairs aligned concordantly 0 times; of these:
      34 (5.23%) aligned discordantly 1 time
    ----
    616 pairs aligned 0 times concordantly or discordantly; of these:
      1232 mates make up the pairs; of these:
        660 (53.57%) aligned 0 times
        571 (46.35%) aligned exactly 1 time
        1 (0.08%) aligned >1 times
96.70% overall alignment rate
The indentation indicates how subtotals relate to t

雙端測序比對

bowtie2 [options]* -x <bt2-idx> -1 <fq1> -2 <fq2> -S <sam_output> -p <threads> 2>Align.summary

雙端比對模式基本與單端一致妥箕,只需替換fastq文件傳入的參數(shù)即可
-1:一鏈fastq文件
-2:二鏈fastq文件

Bowtie2 還有更多詳細的比對參數(shù)可以調(diào)整,這里就不一一介紹了酝碳。下面再介紹其輸出的SAM文件中各列的含義矾踱。

SAM OUTPUT

SAM文件的每一行代表一個reads的比對情況,至少包含了12列(tab分割)疏哗,從左往右呛讲,每一列的含義依次為:

  1. Read的名字
  2. flags之和

在bowtie2中,flags的含義為
1
The read is one of a pair
2
The alignment is one end of a proper paired-end alignment
4
The read has no reported alignments
8
The read is one of a pair and has no reported alignments
16
The alignment is to the reverse reference strand
32
The other mate in the paired-end alignment is aligned to the reverse reference strand
64
The read is mate 1 in a pair
128
The read is mate 2 in a pair
注意每個比對軟件flags的含義有所區(qū)別

  1. 比對到的參考基因組染色體名稱
  2. read 5’端比對到的參考基因組正鏈染色體坐標(biāo)(1-based)
  3. 比對質(zhì)量
  4. CIGAR字符串返奉,用以表征比對的結(jié)果
  5. 雙端測序中贝搁,二鏈所比對上的染色體名稱,如果與一鏈相同則為=芽偏,如果沒有二鏈則為*
  6. 雙端測序中雷逆,二鏈read 5’端比對到的參考基因組正鏈染色體坐標(biāo)(1-based),如果沒有二鏈則為0
  7. 推測的一鏈與二鏈之間的片段長度污尉。該值為負表明膀哲,二鏈比對到一鏈的上游;該值為0表明二鏈沒有比對上被碗;該值為non-0表明二鏈與一鏈比對到不同的染色體上(non-0如何理解某宪?)
  8. Read的序列
  9. ASCII 編碼的read堿基質(zhì)量
  10. 可選的列,包括以下這些
AS:i:<N> Alignment score. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if SAM record is for an aligned read. 
XS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i. 
YS:i:<N> Alignment score for opposite mate in the paired-end alignment. Only present if the SAM record is for a read that aligned as part of a paired-end alignment. 
XN:i:<N> The number of ambiguous bases in the reference covering this alignment. Only present if SAM record is for an aligned read. 
XM:i:<N> The number of mismatches in the alignment. Only present if SAM record is for an aligned read. 
XO:i:<N> The number of gap opens, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read. 
XG:i:<N> The number of gap extensions, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read. 
NM:i:<N> The edit distance; that is, the minimal number of one-nucleotide edits (substitutions, insertions and deletions) needed to transform the read string into the reference string. Only present if SAM record is for an aligned read. 
YF:Z:<S> String indicating reason why the read was filtered out. See also: Filtering. Only appears for reads that were filtered out. 
YT:Z:<S> Value of UU indicates the read was not part of a pair. Value of CP indicates the read was part of a pair and the pair aligned concordantly. Value of DP indicates the read was part of a pair and the pair aligned discordantly. Value of UP indicates the read was part of a pair but the pair failed to aligned either concordantly or discordantly. 
MD:Z:<S> A string representation of the mismatched reference bases in the alignm

以上就是對Bowtie 2進行基因組比對的一些總結(jié)锐朴,以后有新的心得再做補充兴喂。

ref:
http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#how-is-bowtie-2-different-from-bowtie-1

完。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末焚志,一起剝皮案震驚了整個濱河市衣迷,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌酱酬,老刑警劉巖壶谒,帶你破解...
    沈念sama閱讀 206,839評論 6 482
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異膳沽,居然都是意外死亡佃迄,警方通過查閱死者的電腦和手機泼差,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,543評論 2 382
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來呵俏,“玉大人,你說我怎么就攤上這事滔灶∑账椋” “怎么了?”我有些...
    開封第一講書人閱讀 153,116評論 0 344
  • 文/不壞的土叔 我叫張陵录平,是天一觀的道長麻车。 經(jīng)常有香客問我,道長斗这,這世上最難降的妖魔是什么动猬? 我笑而不...
    開封第一講書人閱讀 55,371評論 1 279
  • 正文 為了忘掉前任,我火速辦了婚禮表箭,結(jié)果婚禮上赁咙,老公的妹妹穿的比我還像新娘。我一直安慰自己免钻,他們只是感情好彼水,可當(dāng)我...
    茶點故事閱讀 64,384評論 5 374
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著极舔,像睡著了一般凤覆。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上拆魏,一...
    開封第一講書人閱讀 49,111評論 1 285
  • 那天盯桦,我揣著相機與錄音,去河邊找鬼渤刃。 笑死拥峦,一個胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的溪掀。 我是一名探鬼主播事镣,決...
    沈念sama閱讀 38,416評論 3 400
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼揪胃!你這毒婦竟也來了璃哟?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 37,053評論 0 259
  • 序言:老撾萬榮一對情侶失蹤喊递,失蹤者是張志新(化名)和其女友劉穎随闪,沒想到半個月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體骚勘,經(jīng)...
    沈念sama閱讀 43,558評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡铐伴,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,007評論 2 325
  • 正文 我和宋清朗相戀三年撮奏,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片当宴。...
    茶點故事閱讀 38,117評論 1 334
  • 序言:一個原本活蹦亂跳的男人離奇死亡畜吊,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出户矢,到底是詐尸還是另有隱情玲献,我是刑警寧澤,帶...
    沈念sama閱讀 33,756評論 4 324
  • 正文 年R本政府宣布梯浪,位于F島的核電站捌年,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏挂洛。R本人自食惡果不足惜礼预,卻給世界環(huán)境...
    茶點故事閱讀 39,324評論 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望虏劲。 院中可真熱鬧托酸,春花似錦、人聲如沸伙单。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,315評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽吻育。三九已至念秧,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間布疼,已是汗流浹背摊趾。 一陣腳步聲響...
    開封第一講書人閱讀 31,539評論 1 262
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留游两,地道東北人砾层。 一個月前我還...
    沈念sama閱讀 45,578評論 2 355
  • 正文 我出身青樓,卻偏偏與公主長得像贱案,于是被迫代替她去往敵國和親肛炮。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 42,877評論 2 345

推薦閱讀更多精彩內(nèi)容