Coverage Depth
覆蓋深度 mapping depth
基因組被測(cè)序片段(短讀 short reads)“覆蓋”的強(qiáng)度有多大局冰?
每一堿基的覆蓋率是基因組堿基被測(cè)序的平均次數(shù)宫纬。 基因組的覆蓋深度是通過與基因組匹配的所有短讀的堿基數(shù)目除以該基因組的長度來計(jì)算的。 它通常表示為1X雇庙、2X、3X灶伊、...(1、2或3倍覆蓋)寒跳。
此處通常被稱為測(cè)序深度(sequencing depth)或者覆蓋深度(depth of coverage)聘萨。
Depth 計(jì)算公式
原始測(cè)序數(shù)據(jù)的深度,Depth = (# sequenced bases) / (# bases of reference) 或者
比對(duì)上的數(shù)據(jù)的深度童太,Depth= (# bases of all mapped reads) / (# bases of reference).
覆蓋范圍 covered length
短讀“覆蓋”了基因組的多少區(qū)域米辐?是否有些區(qū)域沒有任何的短讀覆蓋?
覆蓋范圍是參考基因組被一定深度覆蓋的堿基的百分比书释。例如翘贮,一個(gè)基因組的90%區(qū)域在1X深度覆蓋,另外仍然有70%的區(qū)域被覆蓋了5X深度爆惧。
此處通常稱為覆蓋度(genome covarage)或者覆蓋范圍(breadth of coverage)狸页。
Coverage 計(jì)算公式
Coverage = (# area covered by mapped reads) / (# area of reference).
例子
對(duì)于全基因組
Depth = (6 * 28nt) / 112nt = 1.2 fold
Coverage = (46nt - 5nt) / 112nt = 36.6%
對(duì)于target區(qū)域
Depth = (6 * 28nt) / 46nt = 3.7 fold
Coverage = (46nt - 5nt) / 46nt = 89.1%
對(duì)于position
Depth = 6 fold
reads數(shù)
下機(jī)的fq文件中到底有多少條reads?可以使用軟件readfq來統(tǒng)計(jì):
git clone https://github.com/billzt/readfq
gcc -o kseq_fastq_base kseq_fastq_base.c -lz
kseq_fastq_base samp.fq
#FASTQ files could be gzipped.
#The statistic results would be printed on STDOUT.
功能單一扯再,優(yōu)化了速度:
It could deal with ~4M reads (1G base) in less than 40 seconds, ~50M reads (14G base) in less than 5 minutes!!!
現(xiàn)在已有很多工具都會(huì)提供這個(gè)功能芍耘,間接或者直接計(jì)算測(cè)序深度和覆蓋度,比如:samtools熄阻、sambamba斋竞、bedtools、qualimap秃殉、gatk等等坝初,下面對(duì)各個(gè)工具的使用和結(jié)果做一點(diǎn)點(diǎn)匯整浸剩。
計(jì)算測(cè)序深度和覆蓋度
samtools or sambamba
- depth可以得到每一個(gè)堿基位點(diǎn)上的測(cè)序深度:
samtools depth samp.bam > base_depth.txt
sambamba depth -t 10 -w 500 samp.bam > 500base_depth.txt
# -w breadth of the window, in bp, 計(jì)算窗口的平均深度
- stats對(duì)比對(duì)后的bam文件進(jìn)行全面的統(tǒng)計(jì)
samtools stats --threads 10 samp.bam > samp_bamstat.txt
grep "^SN" samp_bamstat.txt
SN raw total sequences: 17870982
SN filtered sequences: 0
SN sequences: 17870982
SN is sorted: 0
SN 1st fragments: 8935491
SN last fragments: 8935491
SN reads mapped: 17870982
SN reads mapped and paired: 17870982 # paired-end technology bit set + both mates mapped
SN reads unmapped: 0
SN reads properly paired: 5208824 # proper-pair bit set
SN reads paired: 17870982 # paired-end technology bit set
SN reads duplicated: 0 # PCR or optical duplicate bit set
SN reads MQ0: 556465 # mapped and MQ=0
SN reads QC failed: 0
SN non-primary alignments: 0
SN total length: 1804969182 # ignores clipping
SN bases mapped: 1804969182 # ignores clipping
SN bases mapped (cigar): 1804969182 # more accurate
SN bases trimmed: 0
SN bases duplicated: 0
SN mismatches: 9941932 # from NM fields
SN error rate: 5.508089e-03 # mismatches / bases mapped (cigar)
SN average length: 101
SN maximum length: 101
SN average quality: 35.3
SN insert size average: 1330.1
SN insert size standard deviation: 1341.6
SN inward oriented pairs: 1418095
SN outward oriented pairs: 564408
SN pairs with other orientation: 6745980
SN pairs on different chromosomes: 207008
- 接下來可以使用
plot-bamstats
進(jìn)行結(jié)果的可視化,plot-bamstats
是samtools中包含的一個(gè)軟件鳄袍,可以在samtools的根目錄找到乒省,查看使用:
plot-bamstats --help
About: Parses output of samtools stats (former bamcheck) and calls gnuplot to create graphs.
Usage: plot-bamstats [OPTIONS] file.bam.bc
plot-bamstats -p outdir/ file.bam.bc
Options:
-k, --keep-files Do not remove temporary files.
-l, --log-y Set the Y axis scale of the Insert Size plot to log 10.
-m, --merge Merge multiple bamstats files and output to STDOUT.
-p, --prefix <path> The output files prefix, add a slash to create new directory.
-r, --ref-stats <file.fa.gc> Optional reference stats file with expected GC content (created with -s).
-s, --do-ref-stats <file.fa> Calculate reference sequence GC for later use with -r
-t, --targets <file.tab> Restrict -s to the listed regions (tab-delimited chr,from,to. 1-based, inclusive)
-h, -?, --help This help message.
例子如下:
plot-bamstats -s /home/luna/Desktop/database/homo_bwa/hsa.fa > gc
plot-bamstats -r gc -p ./test samp_bamstat.txt
先計(jì)算基因組的gc含量分布,再給出一系列文件和圖片畦木,還有一個(gè)html文件:
ll test* gc
-rw-rw-r-- 1 luna luna 1910 Oct 14 16:04 gc
-rw-rw-r-- 1 luna luna 4266 Oct 14 16:04 test-acgt-cycles.gp
-rw-rw-r-- 1 luna luna 16707 Oct 14 16:04 test-acgt-cycles.png
-rw-rw-r-- 1 luna luna 429 Oct 14 16:04 test-coverage.gp
-rw-rw-r-- 1 luna luna 10497 Oct 14 16:04 test-coverage.png
-rw-rw-r-- 1 luna luna 4736 Oct 14 16:04 test-gc-content.gp
-rw-rw-r-- 1 luna luna 21895 Oct 14 16:04 test-gc-content.png
-rw-rw-r-- 1 luna luna 1026 Oct 14 16:04 test-gc-depth.gp
-rw-rw-r-- 1 luna luna 23312 Oct 14 16:04 test-gc-depth.png
-rw-rw-r-- 1 luna luna 4606 Oct 14 16:04 test.html
-rw-rw-r-- 1 luna luna 3528 Oct 14 16:04 test-indel-cycles.gp
-rw-rw-r-- 1 luna luna 28786 Oct 14 16:04 test-indel-cycles.png
-rw-rw-r-- 1 luna luna 1739 Oct 14 16:04 test-indel-dist.gp
-rw-rw-r-- 1 luna luna 29595 Oct 14 16:04 test-indel-dist.png
-rw-rw-r-- 1 luna luna 261710 Oct 14 16:04 test-insert-size.gp
-rw-rw-r-- 1 luna luna 16580 Oct 14 16:04 test-insert-size.png
-rw-rw-r-- 1 luna luna 5939 Oct 14 16:04 test-quals2.gp
-rw-rw-r-- 1 luna luna 21727 Oct 14 16:04 test-quals2.png
-rw-rw-r-- 1 luna luna 76931 Oct 14 16:04 test-quals3.gp
-rw-rw-r-- 1 luna luna 84648 Oct 14 16:04 test-quals3.png
-rw-rw-r-- 1 luna luna 2228 Oct 14 16:04 test-quals.gp
-rw-rw-r-- 1 luna luna 47463 Oct 14 16:04 test-quals-hm.gp
-rw-rw-r-- 1 luna luna 34730 Oct 14 16:04 test-quals-hm.png
-rw-rw-r-- 1 luna luna 15932 Oct 14 16:04 test-quals.png
可以打開html文件展示如下:
點(diǎn)擊小圖袖扛,也會(huì)在旁邊展示大圖,可以更清晰查看十籍。
BAMStats
BAMStats是建立在Picard Java API上的簡(jiǎn)單軟件工具蛆封,可以計(jì)算并以圖形方式顯示從SAM / BAM文件進(jìn)行質(zhì)量控制評(píng)估時(shí)得到的各種指標(biāo)。
BAMStats為覆蓋范圍勾栗,起始位置惨篱,MAPQ值,映射的讀取長度和編輯距離提供描述性統(tǒng)計(jì)信息围俘。 如果提供了合適的bed或gtf文件砸讳,還可以為單個(gè)特征(例如外顯子或誘餌區(qū)域)生成統(tǒng)計(jì)指標(biāo)。
BAMStats需要分配2至6GB的內(nèi)存(Java程序可以使用-Xmx來設(shè)置)界牡;另外SAM / BAM文件都可以作為輸入簿寂,但是必須是排序后的文件。
- 下載
wget https://jaist.dl.sourceforge.net/project/bamstats/BAMStats-1.25.zip
unzip BAMStats-1.25.zip
rm BAMStats-1.25.zip
cd BAMStats-1.25 && ls
## 兩個(gè)jar分別是命令行(CLI)和圖形用戶(GUI)
java -jar BAMStats-1.25/BAMStats-1.25.jar --help
USAGE:
======
Option Description
------ -----------
-d, --distances If selected, edit distance statistics
will also be displayed as a separate
table (optional).
-f, --features Feature file specifying specific
locations to calculate coverage
metrics for (optional)
-h, --help Generate help.
-i, --infile SAM or BAM infile (must be sorted).
-l, --lengths If selected, mapped read length
statistics will also be displayed as
a separate table (optional).
-m, --mapped If selected, coverage statistics, for
mapped regions only, will also be
displayed as a separate table
(optional).
-o, --outfile Outfile (optional).
-q, --qualities If selected, mapping quality (MAPQ)
statistics will also be displayed as
a separate table (optional).
-s, --starts If selected, start position statistics
will also be displayed as a separate
table (optional).
-v, --view View option for output format
(currently accepts 'simple' or
'html'; default, simple).
- 使用
sambamba sort -o smap.sort.bam -t 20 -p samp.bam
java -jar -Xmx30g BAMStats-1.25.jar -i samp.sort.bam -o BAMStats --view simple
cat BAMStats
ref | N | mean | median | sd | q1 | q3 | 2.50% | 97.50% | min | max |
---|---|---|---|---|---|---|---|---|---|---|
chr1 | 249,250,621 | 0.63 | 0 | 11.29 | 0 | 1 | 0 | 3 | 0 | 15051 |
chr2 | 243,199,373 | 0.63 | 0 | 1.43 | 0 | 1 | 0 | 3 | 0 | 706 |
chr3 | 198,022,430 | 0.64 | 0 | 1 | 0 | 1 | 0 | 3 | 0 | 102 |
chr4 | 191,154,276 | 0.58 | 0 | 1.13 | 0 | 1 | 0 | 3 | 0 | 527 |
... | ||||||||||
chr22 | 51,304,566 | 0.36 | 0 | 0.91 | 0 | 0 | 0 | 3 | 0 | 207 |
chrX | 155,270,560 | 0.59 | 0 | 1.29 | 0 | 1 | 0 | 3 | 0 | 832 |
chrY | 59,373,566 | 0.07 | 0 | 4.6 | 0 | 0 | 0 | 0 | 0 | 1859 |
chrMT | 16,571 | 26.14 | 22 | 15.47 | 16 | 32 | 7 | 69.7 | 0 | 98 |
改為html輸出宿亡,可以看看:
java -jar -Xmx30g BAMStats-1.25.jar -d -l -m -q -s -i samp.sort.bam -o BAMStats2 --view html
會(huì)產(chǎn)生一個(gè)文件夾BAMStats2.data
和一個(gè)html文件BAMStats2
常遂,進(jìn)入查看:
cd BAMStats2.data && ls | wc -l
# 531
ll chr1_*
-rw-rw-r-- 1 luna luna 5056 Oct 14 16:43 chr1_Coverage_boxAndWhisker.png
-rw-rw-r-- 1 luna luna 11480 Oct 14 16:43 chr1_Coverage_cumulativeHistogram.png
-rw-rw-r-- 1 luna luna 12513 Oct 14 16:43 chr1_Coverage_histogram.png
-rw-rw-r-- 1 luna luna 753 Oct 14 16:43 chr1_Coverage.html
-rw-rw-r-- 1 luna luna 4528 Oct 14 16:44 chr1_EditDistances_boxAndWhisker.png
-rw-rw-r-- 1 luna luna 10601 Oct 14 16:44 chr1_EditDistances_cumulativeHistogram.png
-rw-rw-r-- 1 luna luna 11156 Oct 14 16:44 chr1_EditDistances_histogram.png
-rw-rw-r-- 1 luna luna 770 Oct 14 16:44 chr1_EditDistances.html
-rw-rw-r-- 1 luna luna 5852 Oct 14 16:43 chr1_MappedCoverage_boxAndWhisker.png
-rw-rw-r-- 1 luna luna 12311 Oct 14 16:43 chr1_MappedCoverage_cumulativeHistogram.png
-rw-rw-r-- 1 luna luna 12688 Oct 14 16:43 chr1_MappedCoverage_histogram.png
-rw-rw-r-- 1 luna luna 784 Oct 14 16:43 chr1_MappedCoverage.html
-rw-rw-r-- 1 luna luna 5107 Oct 14 16:44 chr1_MappingQuality_boxAndWhisker.png
-rw-rw-r-- 1 luna luna 11297 Oct 14 16:44 chr1_MappingQuality_cumulativeHistogram.png
-rw-rw-r-- 1 luna luna 11698 Oct 14 16:44 chr1_MappingQuality_histogram.png
-rw-rw-r-- 1 luna luna 771 Oct 14 16:44 chr1_MappingQuality.html
-rw-rw-r-- 1 luna luna 4979 Oct 14 16:44 chr1_ReadLengths_boxAndWhisker.png
-rw-rw-r-- 1 luna luna 10979 Oct 14 16:44 chr1_ReadLengths_cumulativeHistogram.png
-rw-rw-r-- 1 luna luna 10979 Oct 14 16:44 chr1_ReadLengths_histogram.png
-rw-rw-r-- 1 luna luna 773 Oct 14 16:44 chr1_ReadLengths.html
-rw-rw-r-- 1 luna luna 463 Oct 14 16:43 chr1_StartPositions.html
查看html文件BAMStats2
,重命名挽荠,同時(shí)下載html和文件夾BAMStats2.data
到本地克胳,放在同一個(gè)目錄下尘盼,以后使用瀏覽器打開:
mv BAMStats2 BAMStats2.html
tar -cf Stats2.tar BAMStats2*
sz Stats2.tar
# 本地解壓
打開BAMStats2.html
随静,部分結(jié)果展示:
也可以到文件夾BAMStats2.data
中查看每一條染色體:
GUI界面也可以看下圖:
GATK DepthOfCoverage
gatk3和gatk4中都有這個(gè)工具莺褒,DepthOfCoverage阻荒,本處使用gatk 4.1.8.1:
which gatk
/home/luna/Desktop/Software/gatk/gatk-4.1.8.1/gatk
# gatk4
gatk DepthOfCoverage -R /home/luna/Desktop/database/homo_bwa/hg19.fa -I samp.sort.bam -O gatk -L chr1 --summary-coverage-threshold 1 --summary-coverage-threshold 4
# gatk3
java -jar /home/luna/Desktop/Software/gatk/GenomeAnalysisTK.jar -T DepthOfCoverage -I samp.sort.bam -R /home/luna/Desktop/database/homo_bwa/hg19.fa --out gatk3 --summaryCoverageThreshold 1 --summaryCoverageThreshold 4 -L chr1
# -L 后面可以接單個(gè)染色體的名稱植袍,也可以多次使用瞄摊,或者bed文件包含基因組的多個(gè)區(qū)域作為限定
# summary Coverage Threshold膏蚓,可以隨意設(shè)置數(shù)值(N)建芙,表示%_above_N来累,即深度超過N的coverage of breadth.
運(yùn)行結(jié)束會(huì)生成多個(gè)文件砚作,主要是查看一下summary文件:
no suffix: per locus coverage
_summary: total, mean, median, quartiles, and threshold proportions, aggregated over all bases
_statistics: coverage histograms (# locus with X coverage), aggregated over all bases
_interval_summary: total, mean, median, quartiles, and threshold proportions, aggregated per interval
_interval_statistics: 2x2 table of # of intervals covered to >= X depth in >=Y samples
_gene_summary: total, mean, median, quartiles, and threshold proportions, aggregated per gene
_gene_statistics: 2x2 table of # of genes covered to >= X depth in >= Y samples
_cumulative_coverage_counts: coverage histograms (# locus with >= X coverage), aggregated over all bases
_cumulative_coverage_proportions: proprotions of loci with >= X coverage, aggregated over all bases
## 查看
cat gatk.sample_summary
sample_id,total,mean,granular_third_quartile,granular_median,granular_first_quartile,%_bases_above_1,%_bases_above_4
SRX247249,155122543,0.69,2,1,1,41.8,2.2
Total,155122543,0.69,N/A,N/A,N/A,N/A,N/A
BamStats
此處有多個(gè)bamstats的程序,所以作為區(qū)分嘹锁,列出來葫录。
Biopet BamStats
Biopet BamStats is a package that contains tools to generate stats from a BAM file, merge those stats for multiple samples, and validate the generated stats files.
Mode - Generate
Generate reports clipping stats, flag stats, insert size and mapping quality on a BAM file.
The output of the JSON file is organized in a sample - library - readgroup tree structure. If readgroups in the BAM file are not annotated with sample (SM) and library (LB) tags an error will be thrown. This can be fixed by using samtools addreplacerg or picard AddOrReplaceReadGroups.
Mode - Merge
This module will merge bamstats files together and keep the sample/library/readgroup structure. Values for the same readgroups will be added. It will also validate the resulting file.
Mode - Validate
Validates a BamStats file. If aggregation values can not be regenerated the file is considered corrupt. This should only happen when the file has been manually edited.
website: https://biopet.github.io/bamstats/1.0.1/index.html
wget https://github.com/biopet/bamstats/releases/download/v1.0.1/BamStats-assembly-1.0.1.jar
mv BamStats-assembly-1.0.1.jar BamStats-biopet.jar
mkdir biopet
java -jar /home/luna/Desktop/Software/StatsBAM/BamStats-biopet.jar generate -R /home/luna/Desktop/database/homo_bwa/hg19.fa -o biopet -b samp.sort.bam --tsvOutputs
# 會(huì)在biopet生成文件
3prime_clipping.histogram.tsv
5prime_clipping.stats.tsv
clipping.histogram.tsv
insertsize.histogram.tsv
left_clipping.stats.tsv
right_clipping.histogram.tsv
3prime_clipping.stats.tsv
bamstats.json
clipping.stats.tsv
insertsize.stats.tsv
mappingQuality.histogram.tsv
right_clipping.stats.tsv
5prime_clipping.histogram.tsv
bamstats.summary.json
flagstats.tsv
left_clipping.histogram.tsv
mappingQuality.stats.tsv
結(jié)果很詳細(xì),但是對(duì)于json文件我沒有找到合適的后續(xù)處理领猾,這里僅記錄米同。
guigolab bamstats
wget -O - https://github.com/guigolab/bamstats/releases/download/v0.3.5/bamstats-v0.3.5-linux-x86_64.tar.gz | tar xz -C ./ bamstats
/home/luna/Desktop/Software/StatsBAM/bamstats/bamstats -h
bamstats - compute mapping statistics
Usage:
bamstats [flags]
Flags:
-a, --annotaion string element annotation file
-c, --cpu int number of cpus to be used (default 80)
-h, --help help for bamstats
-i, --input string input file (required)
--loglevel string logging level (default "warn")
--max-buf int maximum number of buffered records (default 1000000)
-o, --output string output file (default "-")
-n, --reads int number of records to process (default -1)
-u, --uniq output genomic coverage statistics for uniqely mapped reads too
--version version for bamstats
Bamstats can currently compute the following mapping statistics:
general
genome coverage
RNA-seq
使用骇扇,具體上github查看,https://github.com/guigolab/bamstats/blob/master/scripts/
/home/luna/Desktop/Software/StatsBAM/bamstats/bamstats -i samp.sort.bam --cpu 20 --uniq --output guigolab.bamstats
bamdst
Bamdst is a lightweight tool to stat the depth coverage of target regions of bam file(s).
Bam file(s) should be properly sorted, and the probe file (bed file) and the output dir must be specified in the first place.
下載安裝
git clone https://github.com/shiquan/bamdst.git
cd bamdst/
make
使用
在使用該軟件之前面粮,我們首先制作一個(gè)bed文件:
cat chr1.region.bed
chr1 64909787 160398769
## PS中間空白是TAB少孝,不是空格
查看使用說明
/home/luna/Desktop/Software/StatsBAM/bamdst/bamdst -h
bamdst version: 1.0.9
USAGE : bamdst [OPTION] -p <probe.bed> -o <output_dir> [in1.bam [in2.bam ... ]]
or : bamdst [OPTION] -p <probe.bed> -o <output_dir> -
Option -o and -p are mandatory:
-o, --outdir output dir
-p, --bed probe or target regions file, the region file will
be merged before calculate depths
測(cè)試
mkdir ./bamdst
/home/luna/Desktop/Software/StatsBAM/bamdst/bamdst -p chr1.region.bed -o ./bamdst samp.sort.bam
cd ./bamdst
ls
# 列出生成的文件
-coverage.report This file contains all the coverage information of target and flank region, and reads stat information of the input file(s).
-cumu.plot Depth distrbution for plot.
-insert.plot Inferred insert size distribution for plot.
-chromosome.report Depth and coverage information of each chromosome.
-region.tsv.gz For each region in probe file (in.bed), average depth, median depth and coverage of these regions will be listed in the file.
-depth.tsv.gz For each position in the probe file(in.bed), three kinds depth value will be calculated, including raw depth, rmdup depth and the coverage depth.
-uncover.bed This bed file contains the bad covered or uncovered region in the input bam file(s) against the probe file.
查看coverage.report 和 chromosome.report
cat coverage.report
## The file was created by bamdst
## Version : 1.0.9
## Files : samp.sort.bam
[Total] Raw Reads (All reads) 17870982
[Total] QC Fail reads 0
[Total] Raw Data(Mb) 1804.97
[Total] Paired Reads 17870982
[Total] Mapped Reads 17870982
[Total] Fraction of Mapped Reads 100.00%
[Total] Mapped Data(Mb) 1804.97
[Total] Fraction of Mapped Data(Mb) 100.00%
[Total] Properly paired 5208824
[Total] Fraction of Properly paired 29.15%
[Total] Read and mate paired 17870982
[Total] Fraction of Read and mate paired 100.00%
[Total] Singletons 0
[Total] Read and mate map to diff chr 414016
[Total] Read1 8935491
[Total] Read2 8935491
[Total] Read1(rmdup) 8935491
[Total] Read2(rmdup) 8935491
[Total] forward strand reads 8940264
[Total] backward strand reads 8930718
[Total] PCR duplicate reads 0
[Total] Fraction of PCR duplicate reads 0.00%
[Total] Map quality cutoff value 20
[Total] MapQuality above cutoff reads 17045397
[Total] Fraction of MapQ reads in all reads 95.38%
[Total] Fraction of MapQ reads in mapped reads 95.38%
[Target] Target Reads 548096
[Target] Fraction of Target Reads in all reads 3.07%
[Target] Fraction of Target Reads in mapped reads 3.07%
[Target] Target Data(Mb) 55.36
[Target] Target Data Rmdup(Mb) 48.10
[Target] Fraction of Target Data in all data 3.07%
[Target] Fraction of Target Data in mapped data 3.07%
[Target] Len of region 95488982
[Target] Average depth 0.58
[Target] Average depth(rmdup) 0.50
[Target] Coverage (>0x) 32.72%
[Target] Coverage (>=4x) 1.87%
[Target] Coverage (>=10x) 0.09%
[Target] Coverage (>=30x) 0.01%
[Target] Coverage (>=100x) 0.00%
[Target] Target Region Count 1
[Target] Region covered > 0x 0
[Target] Fraction Region covered > 0x 0.00%
[Target] Fraction Region covered >= 4x 0.00%
[Target] Fraction Region covered >= 10x 0.00%
[Target] Fraction Region covered >= 30x 0.00%
[Target] Fraction Region covered >= 100x 0.00%
[flank] flank size 200
[flank] Len of region (not include target region) 200
[flank] Average depth 1.09
[flank] flank Reads 4
[flank] Fraction of flank Reads in all reads 0.00%
[flank] Fraction of flank Reads in mapped reads 0.00%
[flank] flank Data(Mb) 0.00
[flank] Fraction of flank Data in all data 0.00%
[flank] Fraction of flank Data in mapped data 0.00%
[flank] Coverage (>0x) 55.50%
[flank] Coverage (>=4x) 2.00%
[flank] Coverage (>=10x) 0.00%
[flank] Coverage (>=30x) 0.00%
[flank] Coverage (>=100x) 0.00%
cat chromosomes.report
#Chromosome | DATA(%) | Avg depth | Median | Coverage% | Cov 4x % | Cov 10x % | Cov 30x % | Cov 100x % |
---|---|---|---|---|---|---|---|---|
chr1 | 100 | 0.58 | 0 | 32.72 | 1.87 | 0.09 | 0.01 | 0 |
Qualimap
Qualimap是功能比較全的一款質(zhì)控軟件,提供GUI界面和命令行界面熬苍,可以對(duì)bam文件稍走,RNA-seq,Counts數(shù)據(jù)質(zhì)控柴底,也支持比對(duì)數(shù)據(jù)婿脸,counts數(shù)據(jù)和表觀數(shù)據(jù)的比較和聚類。
Qualimap:http://qualimap.bioinfo.cipf.es/doc_html/index.html
命令行文檔:http://qualimap.bioinfo.cipf.es/doc_html/command_line.html#command-line
下載安裝
wget https://bitbucket.org/kokonech/qualimap/downloads/qualimap_v2.2.1.zip
unzip qualimap_v2.2.1.zip
正常使用qualimap柄驻,需要安裝部分R包:
# 安裝相關(guān)的R包
# NOISeq狐树,Repitools, Rsamtools, GenomicFeatures, rtracklayer
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("NOISeq", version = "3.8")
BiocManager::install("Repitools", version = "3.8")
BiocManager::install("Rsamtools", version = "3.8")
BiocManager::install("GenomicFeatures", version = "3.8")
BiocManager::install("rtracklayer", version = "3.8")
使用
/home/luna/Desktop/Software/qualimap/qualimap bamqc --java-mem-size=40G -c -nt 30 -outdir ./ -outformat PDF -os -outfile samp.bamqc.pdf -bam samp.sort.bam -gd hg19 -nr 2000
mosdepth
Mosdepth是一種新的命令行工具,可以快速計(jì)算全基因組測(cè)序覆蓋率鸿脓。輸入BAM或CRAM文件抑钟,計(jì)算在基因組中每個(gè)核苷酸位置或基因組區(qū)域的深度;也可將基因組區(qū)域指定為被BED文件覆蓋的指定區(qū)域野哭,或者指定為CNV calling所需的固定大小窗口在塔。測(cè)試來看,Mosdepth比大多數(shù)現(xiàn)有工具更快虐拓。
下載安裝
## github上有編譯好的最新版的二進(jìn)制文件心俗,可以下載使用
wget https://github.com/brentp/mosdepth/releases/download/v0.3.1/mosdepth
chmod +x mosdepth
./mosdepth --help
mosdepth 0.3.1
Usage: mosdepth [options] <prefix> <BAM-or-CRAM>
Arguments:
<prefix> outputs: `{prefix}.mosdepth.dist.txt`
`{prefix}.mosdepth.summary.txt`
`{prefix}.per-base.bed.gz` (unless -n/--no-per-base is specified)
`{prefix}.regions.bed.gz` (if --by is specified)
`{prefix}.quantized.bed.gz` (if --quantize is specified)
`{prefix}.thresholds.bed.gz` (if --thresholds is specified)
<BAM-or-CRAM> the alignment file for which to calculate depth.
Common Options:
-t --threads <threads> number of BAM decompression threads [default: 0]
-c --chrom <chrom> chromosome to restrict depth calculation.
-b --by <bed|window> optional BED file or (integer) window-sizes.
-n --no-per-base dont output per-base depth. skipping this output will speed execution
substantially. prefer quantized or thresholded values if possible.
-f --fasta <fasta> fasta file for use with CRAM files [default: ].
--d4 output per-base depth in d4 format.
Other options:
-F --flag <FLAG> exclude reads with any of the bits in FLAG set [default: 1796]
-i --include-flag <FLAG> only include reads with any of the bits in FLAG set. default is unset. [default: 0]
-x --fast-mode dont look at internal cigar operations or correct mate overlaps (recommended for most use-cases).
-q --quantize <segments> write quantized output see docs for description.
-Q --mapq <mapq> mapping quality threshold. reads with a quality less than this value are ignored [default: 0]
-T --thresholds <thresholds> for each interval in --by, write number of bases covered by at
least threshold bases. Specify multiple integer values separated
by ','.
-m --use-median output median of each region (in --by) instead of mean.
-R --read-groups <string> only calculate depth for these comma-separated read groups IDs.
-h --help show help
參數(shù):
-t 線程數(shù),推薦4個(gè)蓉驹,After about 4 threads, there is no benefit for additional threads
-c 指定染色體去計(jì)算深度
-b 指定基因組區(qū)域,比如基因的位置揪利,就能計(jì)算基因的覆蓋度
-F 排除含有該FLAG的reads态兴。默認(rèn)是1796,代表著:
read unmapped (0x4)
not primary alignment (0x100)
read fails platform/vendor quality checks (0x200)
read is PCR or optical duplicate (0x400)
-Q mapping quality疟位,默認(rèn)是0瞻润,一般可以按照自己的需要進(jìn)行調(diào)整為5或者10
-x 快速模式
-T 按照閥值(比如1X,2X,10X)輸出達(dá)到指定閥值的核苷酸數(shù)目
使用
# To calculate the coverage in each exome capture region
mosdepth --by capture.bed sample-output sample.exome.bam
# coverage thresholds
mosdepth --by capture.bed --thresholds 1,3,6,10 sample-output sample.exome.bam
# WGS example For 500-base windows
mosdepth -n --fast-mode --by 500 sample.wgs $sample.wgs.cram
indexcov
indexcov是goleft內(nèi)的一個(gè)小工具,goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary甜刻。
注意到這個(gè)軟件绍撞,是因?yàn)閙osdepth和indexcov的作圖,可以上github上查看得院,https://github.com/brentp/goleft/tree/master/indexcov
Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.
The output is scaled to around 1. So a long stretch with values of 1.5 would be a heterozygous duplication. This is useful as a quick QC to get coverage values across the genome.
In our tests, we can estimate depth across 60X genomes for 30 samples in 30 seconds.
測(cè)序深度和測(cè)序覆蓋度傻铣,很多時(shí)候?qū)τ诓煌难芯坑兄煌囊螅旅娼o出網(wǎng)上的一個(gè)表格:
Coverage depth recommendations, for human ~3Gb
Category | Detection or Application | Recommended Depth (x) or Reads (millions) | References |
---|---|---|---|
Whole genome sequencing | Homozygous SNVs | 15x | Bentley et al., 2008 |
Heterozygous SNVs | 33x | Bentley et al., 2008 | |
INDELs | 60x | Feng et al., 2014 | |
Genotype calls | 35x | Ajay et al., 2011 | |
CNV | 1-8x | Xie et al., 2009; Medvedev at al., 2010 | |
Whole exome sequencing | Homozygous SNVs | 100x (3x local depth) | Clark et al., 2011; Meynert et al., 2013 |
Heterozygous SNVs | 100x (13x local depth) | Clark et al., 2011; Meynert et al., 2013 | |
INDELs | not recommended | Feng et al., 2014 | |
Transcriptome Sequencing | Differential expression profiling | 10-25M | Liu Y. et al., 2014; ENCODE 2011 RNA-Seq |
Alternative splicing | 50-100M | Liu Y. et al., 2013; ENCODE 2011 RNA-Seq | |
Allele specific expression | 50-100M | Liu Y. et al., 2013; ENCODE 2011 RNA-Seq | |
De novo assembly | >100M | Liu Y. et al., 2013; ENCODE 2011 RNA-Seq | |
DNA Target-Based Sequencing | ChIP-Seq | 10-14M (sharp peaks); 20-40M (broad marks) | Rozowsky et al., 2009; ENCODE 2011 Genome; Landt et al., 2012 |
Hi-C | 100M | Belton, J.M et al., 2012 | |
4C (Circularized Chromosome Confirmation Capture) | 1-5M | van de Weken, H.J.G. et al., 2012 | |
5C (Chromosome Carbon Capture Carbon Copy) | 15-25M | Sanyal A. et al., 2012 | |
ChIA-PET (Chromatin Interaction Analysis by Paired-End Tag Sequencing) | 15-20M | Zhang, J. et al., 2012 | |
FAIRE-Seq | 25-55M | ENCODE 2011 Genome; Landt et al., 2012 | |
DNAse 1-Seq | 25-55M | Landt et al., 2012 | |
DNA Methylation Sequencing | CAP-Seq | >20M | Long, H.K. et al., 2013 |
MeDIP-Seq | 60M | Taiwo, O. et al., 2012 | |
RRBS (Reduced Representation Bisulfite Sequencing) | 10X | ENCODE 2011 Genome | |
Bisulfite-Seq | 5-15X; 30X | Ziller, M.J et al., 2015; Epigenomics Road Map | |
RNA-Target-Based Sequencing | CLIP-Seq | 10-40M | Cho J. et al., 2012; Eom T. et al., 2013; Sugimoto Y. et al., 2012 |
iCLIP | 5-15M | Sugimoto Y. et al., 2012; Rogelj B. et al., 2012 | |
PAR-CLIP | 5-15M | Rogelj B. et al., 2012 | |
RIP-Seq | 5-20M | Lu Z. et al., 2014 | |
Small RNA (microRNA) Sequencing | Differential Expression | ~1-2M | Metpally RPR et al., 2013; Campbell et al., 2015 |
Discovery | ~5-8M | Metpally RPR et al., 2013; Campbell et al., 2015 |
References
Recommended Coverage and Read Depth
—— dulunar 后記于 2020.10