deepTools 是一套基于python開發(fā)的工具,適用于有效處理分析高通量測(cè)序數(shù)據(jù)始衅,可用于ChIP-seq, RNA-seq 或 MNase-seq汛闸。
#1. deepTools 系列工具
##1.1 deepTools 系列工具信息匯總
tool | type | input files | main output file(s) | application |
---|---|---|---|---|
multiBamSummary | data integration | 2 or more BAM | interval-based table of values | perform cross-sample analyses of read counts –> plotCorrelation, plotPCA |
multiBigwigSummary | data integration | 2 or more bigWig | interval-based table of values | perform cross-sample analyses of genome-wide scores –> plotCorrelation, plotPCA |
plotCorrelation | visualization | bam/multiBigwigSummary output | clustered heatmap | visualize the Pearson/Spearman correlation |
plotPCA | visualization | bam/multiBigwigSummary output | 2 PCA plots | visualize the principal component analysis |
plotFingerprint | QC | 2 BAM | 1 diagnostic plot | assess enrichment strength of a ChIP sample |
computeGCBias | QC | 1 BAM | 2 diagnostic plots | calculate the exp. and obs. GC distribution of reads |
correctGCBias | QC | 1 BAM, output from computeGCbias | 1 GC-corrected BAM | obtain a BAM file with reads distributed according to the genome’s GC content |
bamCoverage | normalization | BAM | bedGraph or bigWig | obtain the normalized read coverage of a single BAM file |
bamCompare | normalization | 2 BAM | bedGraph or bigWig | normalize 2 files to each other (e.g. log2ratio, difference) |
computeMatrix | data integration | 1 or more bigWig, 1 or more BED | zipped file for plotHeatmap or plotProfile | compute the values needed for heatmaps and summary plots |
estimateReadFiltering | information | 1 or more BAM files | table of values | estimate the number of reads filtered from a BAM file or files |
alignmentSieve | QC | 1 BAM file | 1 filtered BAM or BEDPE file | filters a BAM file based on one or more criteria |
plotHeatmap | visualization | computeMatrix output | heatmap of read coverages | visualize the read coverages for genomic regions |
plotProfile | visualization | computeMatrix output | summary plot (“meta-profile”) | visualize the average read coverages over a group of genomic regions |
plotCoverage | visualization | 1 or more BAM | 2 diagnostic plots | visualize the average read coverages over sampled genomic positions |
bamPEFragmentSize | information | 1 BAM | text with paired-end fragment length | obtain the average fragment length from paired ends |
plotEnrichment | visualization | 1 or more BAM and 1 or more BED/GTF | A diagnostic plot | plots the fraction of alignments overlapping the given features |
computeMatrixOperations | miscellaneous | 1 or more BAM and 1 or more BED/GTF | A diagnostic plot | plots the fraction of alignments overlapping the given features |
##1.2 BAM 和bigWig文件處理工具
? 利用兩個(gè)或多個(gè)bam文件計(jì)算基因組區(qū)段reads覆蓋度厘肮;BED-file
指定基因組區(qū)域,bins
模式可用于全基因組范圍分析大咱;產(chǎn)生的結(jié)果(.npz
)可用于plotCorrelation
進(jìn)行相關(guān)性分析和用于plotPCA
進(jìn)行主成分分析溯捆。
$ deepTools2.0/bin/multiBamSummary bins \
--bamfiles testFiles/*bam \ # using all BAM files in the folder
--minMappingQuality 30 \
--region 19 \ # limiting the binning of the genome to chromosome 19
--labels H3K27me3 H3K4me1 H3K4me3 HeK9me3 input \
-out readCounts.npz --outRawCounts readCounts.tab
$ head readCounts.tab
#'chr' 'start' 'end' 'H3K27me3' 'H3K4me1' 'H3K4me3' 'HeK9me3' 'input'
19 10000 20000 0.0 0.0 0.0 0.0 0.0
19 20000 30000 0.0 0.0 0.0 0.0 0.0
19 30000 40000 0.0 0.0 0.0 0.0 0.0
19 40000 50000 0.0 0.0 0.0 0.0 0.0
19 50000 60000 0.0 0.0 0.0 0.0 0.0
19 60000 70000 1.0 1.0 0.0 0.0 1.0
19 70000 80000 0.0 1.0 7.0 0.0 1.0
19 80000 90000 15.0 0.0 0.0 6.0 4.0
19 90000 100000 73.0 7.0 4.0 16.0 5.0
multiBigwigSummary
? 與multiBamSummary相比煮仇,輸入文件格式是bigWig 刨仑。correctGCBias
? 矯正GC-bias;-
? bamCoverage 利用測(cè)序數(shù)據(jù)比對(duì)結(jié)果轉(zhuǎn)換為基因組區(qū)域reads覆蓋度結(jié)果轻抱。可以自行設(shè)定覆蓋度計(jì)算的窗口大小(bin)夭问;bamCoverage 內(nèi)置了各種標(biāo)準(zhǔn)化方法:scaling factor, Reads Per Kilobase per Million mapped reads (RPKM), counts per million (CPM), bins per million mapped reads (BPM) and 1x depth (reads per genome coverage, RPGC).
Example : bamCoverage 用于ChIPseq分析
bamCoverage --bam a.bam -o a.SeqDepthNorm.bw \
--binSize 10
--normalizeUsing RPGC
--effectiveGenomeSize 2150570000
--ignoreForNormalization chrX
--extendReads
--outFileFormat bedgraph
-
bamCompare
? 兩個(gè)BAM 文件相比較陕见,計(jì)算二者之間窗口中的reads豐度比率灰粮。
usage: bamCompare -b1 treatment.bam -b2 control.bam -o log2ratio.bw
- bigwigCompare
-
computeMatrix
? 給基因組區(qū)段打分粘舟,產(chǎn)生的文件可用于plotHeatmap
和plotProfiles
作圖霞揉;基因組區(qū)段可以是基因或其他區(qū)域,使用BED格式文件定義即可秽荞。
computeMatrix 有兩種不同的模式
-
reference-point
(relative to a point): 計(jì)算某個(gè)點(diǎn)的信號(hào)豐度 -
scale-regions
(over a set of regions): 把所有基因組區(qū)段縮放至同樣大小凌节,然后計(jì)算其信號(hào)豐度
如下命令查看幫助:
$ computeMatrix scale-regions –help
$ computeMatrix scale-regions -S <biwig file(s)> -R <bed file> -b 1000
$ computeMatrix reference-point –help
$ computeMatrix reference-point -S <biwig file(s)> -R <bed file> -a 3000 -b 3000
Example 1:?jiǎn)蝹€(gè)輸入文件 (reference-point mode)
$ computeMatrix reference-point \ # choose the mode
--referencePoint TSS \ # alternatives: TSS, TES, center
-b 3000 -a 10000 \ # define the region you are interested in
-R testFiles/genes.bed \
-S testFiles/log2ratio_H3K4Me3_chr19.bw \
--skipZeros \
-o matrix1_H3K4me3_l2r_TSS.gz \ # to be used with plotHeatmap and plotProfile
--outFileSortedRegions regions1_H3K4me3_l2r_genes.bed
? 注:point-BED文件指定基因組區(qū)段的起始位置
Example 2:多個(gè)輸入文件 (scale-regions mode)
$ deepTools2.0/bin/computeMatrix scale-regions \
-R genes_chr19_firstHalf.bed genes_chr19_secondHalf.bed \ # separate multiple files with spaces
-S testFiles/log2ratio_*.bw \ or use the wild card approach
-b 3000 -a 3000 \
--regionBodyLength 5000 \
--skipZeros -o matrix2_multipleBW_l2r_twoGroups_scaled.gz \
--outFileNameMatrix matrix2_multipleBW_l2r_twoGroups_scaled.tab \
--outFileSortedRegions regions2_multipleBW_l2r_twoGroups_genes.bed
Note that the reported regions will have the same coordinates as the ones in
##1.3 質(zhì)控工具
-
plotCorrelation
? 基于multiBamSummary 或multiBigwigSummary結(jié)果計(jì)算樣品間的相關(guān)性娱挨。并且還可以通過Scatterplot或Heatmap進(jìn)行展示跷坝。
Example 1:Scatterplot
$ deepTools2.0/bin/plotCorrelation \
-in scores_per_transcript.npz \
--corMethod pearson --skipZeros \
--plotTitle "Pearson Correlation of Average Scores Per Transcript" \
--whatToPlot scatterplot \
-o scatterplot_PearsonCorr_bigwigScores.png \
--outFileCorMatrix PearsonCorr_bigwigScores.tab
Example 2:Heatmap
$ deepTools2.0/bin/plotCorrelation \
-in readCounts.npz \
--corMethod spearman --skipZeros \
--plotTitle "Spearman Correlation of Read Counts" \
--whatToPlot heatmap --colorMap RdYlBu --plotNumbers \
-o heatmap_SpearmanCorr_readCounts.png \
--outFileCorMatrix SpearmanCorr_readCounts.tab
-
plotPCA
? 基于multiBamSummary 或multiBigwigSummary結(jié)果進(jìn)行主成分分析贴届,并作出基于兩個(gè)主成分的圖和前五個(gè)特征代表性的圖。
Example
$ deepTools2.0/bin/plotPCA -in readCounts.npz \
-o PCA_readCounts.png \
-T "PCA of read counts"
-
plotFingerprint
? 對(duì)樣本比對(duì)結(jié)果reads累積情況進(jìn)行展示元潘。一定長(zhǎng)度窗口(bin)上reads數(shù)進(jìn)行計(jì)數(shù)牲距,然后排序,再依次累加畫圖皮服。input (能測(cè)到90%DNA片段)在基因組理論上是均勻分布龄广,隨著測(cè)序深度增加趨近于直線,實(shí)驗(yàn)組在排序越高的窗口處reads累積速度越快敲才,說明這些區(qū)域富集的越特異紧武。
Example
$ deepTools2.0/bin/plotFingerprint \
-b testFiles/*bam \
--labels H3K27me3 H3K4me1 H3K4me3 H3K9me3 input \
--minMappingQuality 30 --skipZeros \
--region 19 --numberOfSamples 50000 \
-T "Fingerprints of different samples" \
--plotFile fingerprints.png \
--outRawCounts fingerprints.tab
-
bam PEFragmentSize
? 計(jì)算bam文件中雙端reads的fragment size長(zhǎng)度已添。 -
compute GCBias
? 計(jì)算GC-bias -
plot Coverage
? 計(jì)算樣品測(cè)序深度畦幢。隨機(jī)抽取1 million bp 宇葱,計(jì)算reads數(shù),統(tǒng)計(jì)堿基覆蓋率和覆蓋次數(shù)雷逆。
##1.4 熱圖和總結(jié)圖
Example 1: 根據(jù)computeMatrix
結(jié)果畫熱圖
# run compute matrix to collect the data needed for plotting
$ computeMatrix scale-regions -S H3K27Me3-input.bigWig \
H3K4Me1-Input.bigWig \
H3K4Me3-Input.bigWig \
-R genes19.bed genesX.bed \
--beforeRegionStartLength 3000 \
--regionBodyLength 5000 \
--afterRegionStartLength 3000
--skipZeros -o matrix.mat.gz
$ plotHeatmap -m matrix.mat.gz \
-out ExampleHeatmap1.png \
Example 2: plotHeatmap還可以進(jìn)行聚類分析
$ plotHeatmap -m matrix_two_groups.gz \
-out ExampleHeatmap2.png \
--colorMap RdBu \
--whatToShow 'heatmap and colorbar' \
--zMin -3 --zMax 3 \
--kmeans 4 #聚類參數(shù)
[圖片上傳失敗...(image-d28547-1556115525034)]
其他參數(shù)
顏色自定義:--colorList 'black, yellow' 'white,blue' '#ffffff,orange,#000000'
去掉熱圖邊框:--boxAroundHeatmaps no
-
plotProfile
根據(jù)computeMatrix
結(jié)果畫圖。
Example 1: 根據(jù)樣本畫圖
# run compute matrix to collect the data needed for plotting
$ computeMatrix scale-regions -S H3K27Me3-input.bigWig \
H3K4Me1-Input.bigWig \
H3K4Me3-Input.bigWig \
-R genes19.bed genesX.bed \
--beforeRegionStartLength 3000 \
--regionBodyLength 5000 \
--afterRegionStartLength 3000
--skipZeros -o matrix.mat.gz
$ plotProfile -m matrix.mat.gz \
-out ExampleProfile1.png \
--numPlotsPerRow 2 \
--plotTitle "Test data profile"
Example 2: 根據(jù)基因畫圖
$ plotProfile -m matrix.mat.gz \
-out ExampleProfile2.png \
--plotType=fill \ # add color between the x axis and the lines
--perGroup \ # make one image per BED file instead of per bigWig file
--colors red yellow blue \
--plotTitle "Test data profile"
Example 3: 聚類畫圖
$ plotProfile -m matrix.mat.gz \
--perGroup \
--kmeans 2 \
-out ExampleProfile3.png
Example 4: 畫熱圖
$ plotProfile -m matrix.mat.gz \
--perGroup \
--kmeans 2 \
-plotType heatmap \
-out ExampleProfile3.png
plotEnrichment
? 統(tǒng)計(jì)樣本BED文件中peak或者GTF文件中feature 在chipseq結(jié)果中富集情況
Example
$ plotEnrichment -b Input.bam H3K4Me1.bam H3K4Me3.bam \
--BED up.bed down.bed \
--regionLabels "up regulated" "down regulated" \
-o enrichment.png