參考文章:
1.如何統(tǒng)計(jì)BAM文件中的reads數(shù)
2.Samtools常用命令的總結(jié)
當(dāng)你有很多個(gè)bam文件時(shí),想知道這些bam文件里有多少個(gè)比對上的reads主儡,并且把它們輸出的時(shí)候,應(yīng)該怎么做患朱?當(dāng)然你可以選擇讀取bowtie2的日志文件丢郊,像這樣的:
31991083 reads; of these:
31991083 (100.00%) were unpaired; of these:
6844445 (21.39%) aligned 0 times
18391269 (57.49%) aligned exactly 1 time
6755369 (21.12%) aligned >1 times
78.61% overall alignment rate
但是有時(shí)候我們從別人那里拿到的只是個(gè)bam文件怎么辦被饿?
samtools工具里有一個(gè)功能幫你實(shí)現(xiàn)這個(gè)要求秸弛。
(一)計(jì)算alignments數(shù)
alignment數(shù)并不是mapped read數(shù)铭若,因?yàn)橐粭lread有可能比對到基因組多個(gè)位置。所以這種方法要比實(shí)際的reads數(shù)要多递览。首先如果你有很多個(gè)樣品叼屠,建議你先弄一個(gè)txt,里面是你的樣品名绞铃,像這樣镜雨,比如我有8個(gè)bam文件:
$ cat file_names.txt
A_1
A_2
A_3
A_4
A_5
A_6
A_7
A_8
上面是我的樣品名前綴。
#寫個(gè)腳本憎兽,批量統(tǒng)計(jì)
#!/bin/bash
cat file_names.txt | while read line
do
export alignment_number=$(samtools view -c ${line}_q30_rmdup_sorted.bam)
echo ${line} alignment_number ${alignment_number}
done
輸出結(jié)果:
A_1 alignment_number 23150364
A_2 alignment_number 12724502
A_3 alignment_number 17724364
A_4 alignment_number 14102860
A_5 alignment_number 18809748
A_6 alignment_number 12566000
A_7 alignment_number 19047440
A_8 alignment_number 11808528
(二)統(tǒng)計(jì)雙端測序比對上的reads數(shù)
統(tǒng)計(jì)雙端測序bam文件里一對read都比對上的數(shù)量:
#!/bin/bash
cat file_names.txt | while read line
do
export mapped_reads=$(samtools view -c -f 1 -F 12 ${line}.bam)
echo ${line} mapped_reads_number ${mapped_reads}
done
輸出的內(nèi)容:
A_1 mapped_reads_number 23150364
A_2 mapped_reads_number 12724502
A_3 mapped_reads_number 17724364
A_4 mapped_reads_number 14102860
A_5 mapped_reads_number 18809748
A_6 mapped_reads_number 12566000
A_7 mapped_reads_number 19047440
A_8 mapped_reads_number 11808528
這里你會(huì)發(fā)現(xiàn)我兩種比對的結(jié)果是一樣的冷离,是因?yàn)槲覐睦习迥抢锬玫降腷am文件是他用picard去重過濾之后的bam文件,所以兩種結(jié)果是一樣的纯命,如果你用沒有去重過濾的bam文件進(jìn)行計(jì)算,這兩個(gè)結(jié)果是不一樣的痹栖!
上面兩種都是比較簡單的統(tǒng)計(jì)數(shù)量亿汞,如果你想要具體的信息,比如比對率之類的揪阿,可以用這個(gè)代碼:
$ samtools flagstat file.bam
23150364 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
23150364 + 0 mapped (100.00% : N/A)
23150364 + 0 paired in sequencing
11575182 + 0 read1
11575182 + 0 read2
22447746 + 0 properly paired (96.96% : N/A)
23150364 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
(三)合并兩個(gè)及以上的bam文件
如果你想合并sorted的bam文件疗我,可以這樣:
$ samtools merge finalBamFile.bam *.bam
finalBamFile指的是合并完的bam文件名;后面跟的是你想合并的bam文件南捂,如果只有兩個(gè)吴裤,你可以依次列出;如果有多個(gè)溺健,可以像上面一樣麦牺,用*來表示。
samtools的merge功能在合并之后,輸出的文件也是保持著原來的順序剖膳,即sort的順序魏颓,所以你不用再次sort。
在merge后吱晒,再次檢查mapped reads數(shù)(我是把8個(gè)文件兩兩合并):
merge_1.bam mapped_reads_number 41960112
merge_2.bam mapped_reads_number 25290502
merge_3.bam mapped_reads_number 36771804
merge_4.bam mapped_reads_number 25911388