這一章主要講的是Chip-seq的基本分析方法
Chip-seq
(引用原文)
ChIP-seq是使用染色質(zhì)免疫共沉淀技術(shù)將蛋白質(zhì)結(jié)合的DNA樣本測(cè)序后分析全基因組蛋白質(zhì)結(jié)合位點(diǎn)的分析方法叔收。研究對(duì)象包括轉(zhuǎn)錄因子剑鞍,組蛋白修飾分析
ChIP-seq實(shí)驗(yàn)主要分為如下幾步:
1.Sample fragmentation, 將基因組打斷
2.Immunoprecipitation, 免疫共沉淀
3.DNA purification, DNA提純
4.sequence, 測(cè)序
通常打斷的基因組有150-300bp長(zhǎng)统捶,那么我們測(cè)的是蛋白或者TF結(jié)合的那個(gè)片段修然,當(dāng)這些reads被mapping回到基因組上時(shí),就會(huì)形成一個(gè)peak萍桌,那么peak的位置即為結(jié)合的部分
常用的軟件有:
1.mapping: bowtie1/2, Rsubread
2.peak calling:CCAT, SICER, MACS, ZINBA, BayesPeak, chipseq, ChIPseqR, CSAR, csaw, GenoGAM, iSeq, PICS
3.Peak annotation and analysis: ChIPpeakAnno
4.Gene network building: GeneNetworkBuilder, ChIPXpress
5.Motif enrichment analysis: The MEME Suite, Homer
質(zhì)控
當(dāng)我們完成上游分析柠掂,我們可以在R里面進(jìn)行QC
library(ChIPQC)
experiment <- ChIPQC(samples)
ChIPQCreport(experiment)
峰注釋
我們call完peak以后熊杨,我們也許知道peak的位置沐兵,但是卻不知道是哪些位置的(TSS)或者功能等别垮,不過我們要下載好注釋好的包
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
annoData <- toGRanges(TxDb.Hsapiens.UCSC.hg19.knownGene, feature="gene")
#導(dǎo)入peak文件
library(ChIPpeakAnno)
packagePath <- system.file("extdata", package = "ChIPpeakAnno")
dir(packagePath, "gff|bed|narrowPeak|broadPeak|gz")
#導(dǎo)入峰的GFF文件
toGRanges(file.path(packagePath, "GFF_peaks.gff"), format = "GFF")
#導(dǎo)入bed文件
toGRanges(file.path(packagePath, "WS220.bed"), format = "BED")
#導(dǎo)入narrowpeak
toGRanges(file.path(packagePath, "peaks.narrowPeak"), format = "narrowPeak")
#導(dǎo)入broadpeak
toGRanges(file.path(packagePath, "TAF.broadPeak"), format = "broadPeak")
上述介紹了如何導(dǎo)入我們需要的文件
接下來就可以進(jìn)行注釋了
#以narrowpeak為例
packagePath <- system.file("extdata", "macs", package = "ChIPseqStepByStep")
macs2.files <- dir(packagePath, pattern="*.q1_peaks.narrowPeak$")
peaks <- sapply(macs2.files, function(.ele)
toGRanges(file.path(packagePath, .ele), format="narrowPeak"))
# 尋找最近的TSS
anno <- annotatePeakInBatch(gr1,
AnnotationData=annoData)
head(anno, n=2)
差異peak分析
類似于RNA-seq的差異分析一樣,我們可以尋找差異binding site
DiffBind
文件是根據(jù)實(shí)驗(yàn)設(shè)計(jì)和數(shù)據(jù)存儲(chǔ)地址等信息創(chuàng)建的一個(gè)csv格式文件扎谎,包含的表頭信息有"SampleID"碳想、 "Tissue"、 "Factor"胧奔、 "Condition" 、"Treatment"老充、"Replicate" 、"bamReads" 啡浊、"ControlID"、 "bamControl" "Peaks"巷嚣、 "PeakCaller"(bam,peak文件分別在比對(duì)和call peak的步驟產(chǎn)生)
http://www.reibang.com/p/f849bd55ac27
library(DiffBind)
## 準(zhǔn)備好樣品文件
samples <- read.csv(file.path(system.file("extra", package="DiffBind"),
"tamoxifen.csv"))
samples[1:2, ]
pf <- system.file("extra", package="DiffBind")
samples$bamReads <- file.path(pf, samples$bamReads)
samples$bamControl <- file.path(pf, samples$bamControl)
samples$Peaks <- file.path(pf, samples$Peaks)
tmpfile <- tempfile()
write.csv(samples, tmpfile)
##讀取文件
tamoxifen <- dba(sampleSheet=tmpfile)
##查看樣品間的關(guān)系
plot(tamoxifen)
##計(jì)數(shù)
tamoxifen <- dba.count(tamoxifen, summits=250) ##只對(duì)峰上下游250bp(總計(jì)500bp)進(jìn)行計(jì)數(shù)
##差異分析
tamoxifen <- dba.contrast(tamoxifen, categories=DBA_CONDITION) ##比對(duì)的條件為samples中的Condition列
##因?yàn)闆]有bam文件,所以這個(gè)例子其實(shí)并不能跑起來
tamoxifen <- dba.analyze(tamoxifen)
##提取結(jié)果
tamoxifen.DB <- dba.report(tamoxifen)
這個(gè)包的詳細(xì)教程:http://www.bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf
其中輸入文件為:
csaw
library(csaw)
## 1. Loading in data from BAM files.
library(csaw)
param <- readParam(minq=50)
data <- windowCounts(bam.files, ext=110, width=10, param=param)
## 2. Filtering out uninteresting regions.
library(edgeR)
keep <- aveLogCPM(asDGEList(data)) >= -1
data <- data[keep,]
## 3. Calculating normalization factors.
binned <- windowCounts(bam.files, bin=TRUE, width=10000, param=param)
data <- normOffsets(binned, se.out=data)
## 4. Identifying DB windows.
y <- asDGEList(data)
y <- estimateDisp(y, design)
fit <- glmQLFit(y, design, robust=TRUE)
results <- glmQLFTest(fit)
## 5. Correcting for multiple testing.
merged <- mergeWindows(rowRanges(data), tol=1000L)
tabcom <- combineTests(merged$id, results$table)