可以使用samtools fastq命令從bam文件中提取特定的fastq序列葵礼。
使用幫助
samtools fastq
Usage: samtools fastq [options...] <in.bam>
Options:
-0 FILE write reads designated READ_OTHER to FILE
-1 FILE write reads designated READ1 to FILE
-2 FILE write reads designated READ2 to FILE
note: if a singleton file is specified with -s, only
paired reads will be written to the -1 and -2 files.
-f INT only include reads with all of the FLAGs in INT present [0]
-F INT only include reads with none of the FLAGS in INT present [0]
-G INT only EXCLUDE reads with all of the FLAGs in INT present [0]
-n don't append /1 and /2 to the read name
-N always append /1 and /2 to the read name
-O output quality in the OQ tag if present
-s FILE write singleton reads designated READ1 or READ2 to FILE
-t copy RG, BC and QT tags to the FASTQ header line
-T TAGLIST copy arbitrary tags to the FASTQ header line
-v INT default quality score if not given in file [1]
-i add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
-c compression level [0..9] to use when creating gz or bgzf fastq files
--i1 FILE write first index reads to FILE
--i2 FILE write second index reads to FILE
--barcode-tag TAG Barcode tag [default: BC]
--quality-tag TAG Quality tag [default: QT]
--index-format STR How to parse barcode and quality tags
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
Reads are designated READ1 if FLAG READ1 is set and READ2 is not set.
Reads are designated READ2 if FLAG READ1 is not set and READ2 is set.
Reads are designated READ_OTHER if FLAGs READ1 and READ2 are either both set
or both unset.
Run 'samtools flags' for more information on flag codes and meanings.
The index-format string describes how to parse the barcode and quality tags, for example:
i14i8 the first 14 characters are index 1, the next 8 characters are index 2
n8i14 ignore the first 8 characters, and use the next 14 characters for index 1
If the tag contains a separator, then the numeric part can be replaced with '*' to mean
'read until the separator or end of tag', for example:
n*i* ignore the left part of the tag until the separator, then use the second part
of the tag as index 1
Examples:
To get just the paired reads in separate files, use:
samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -n -F 0x900 in.bam
To get all non-supplementary/secondary reads in a single file, redirect the output:
samtools fastq -F 0x900 in.bam > all_reads.fq
Flags對應(yīng)的意義如下:
BAM比對文件中flags對應(yīng)的意義 - 簡書 (jianshu.com)
以上可以知道數(shù)字4可以將未比對上序列的和比對上序列分開。
數(shù)字64或128可以將read1和read2區(qū)分開的烁。
這是示例序列的比對結(jié)果:
36599658 reads; of these:
36599658 (100.00%) were paired; of these:
36567223 (99.91%) aligned concordantly 0 times
32435 (0.09%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
36567223 pairs aligned concordantly 0 times; of these:
182 (0.00%) aligned discordantly 1 time
----
36567041 pairs aligned 0 times concordantly or discordantly; of these:
73134082 mates make up the pairs; of these:
73133169 (100.00%) aligned 0 times
913 (0.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
0.09% overall alignment rate
提取所有reads
雙末端測序
ls *.bam |while read id
do
samtools fastq -@ 4 -N $id -1 ${id%%_*}_1.fq -2 ${id%%_*}_2.fq
gzip ${id%%_*}_1.fq ${id%%_*}_2.fq
done
單末端測序
ls *.bam |while read id
do
samtools fastq -@ 2 -N $id > ${id%%_*}.fq
gzip ${id%%_*}.fq
done
如果要提取特定狀態(tài)的reads贴届,需要使用f(提取)或F(過濾)參數(shù)+flags(BAM比對文件中flags對應(yīng)的意義 - 簡書 (jianshu.com)
),例如:-f 2為提取兩條reads都比對上的reads;-F 4:提取所有比對上的reads(包含雙末端中只有單條比對上的reads)盯捌;-F 8過濾兩條reads都沒比對上的reads。
有些狀態(tài)的reads不能通過一個(gè)步驟就將其提饶⒒唷饺著;例如配對的兩條reads只有其中一條比對上滤祖,我想提取另一條∑孔眩可以分為兩步提取:
samtools view -f 4 -b in.bam | samtools fastq -F 8 -1 read1-1.fq -2 read2-2.fq
或者兩條reads只有其中一條比對上埂材,我想提取比對上的這一條
samtools view -F 4 -b in.bam | samtools fastq -F 2 -1 read1-1.fq -2 read2-2.fq