背景
經(jīng)常在HiC文章中看到TAD邊界強(qiáng)度變化圖酷愧,大體上可以反映TAD之間隔絕情況,絕緣系數(shù)越低艰管,TAD之間交互越弱。
如下圖:
image.png
畫(huà)圖的方法:
-
有點(diǎn)類(lèi)似ChIP-seq peak profiler圖,可以借助
deeptools
來(lái)繪制。
image.png
image.png 同時(shí)我們也可以用
bedtools
進(jìn)行統(tǒng)計(jì)说莫,用R
畫(huà)圖术羔。
實(shí)踐:
TAD邊界位點(diǎn)
image.png
1.deeptools 繪圖
注
:deeptools
默認(rèn)binsize為10
bp,得到的矩陣會(huì)非常很大(數(shù)據(jù)為G
級(jí)別)赢赊,binsize需要根據(jù)自己需要進(jìn)行修改,一定不要默認(rèn)參數(shù)跑哦级历。
## Step1: boundary 邊界位點(diǎn)
$ cat *boundaries.bed | cut -f 4,5 | grep -v "track" | cut -d "|" -f 3 | tr ":-" "\t" | sort -Vk 1 -k2,2n > sample_boundary.bed
## Step2: insulation score 轉(zhuǎn)換成bigwig格式
$ cat *bedGraph| grep -v "track" | sort -Vk 1 -k2,2n | grep -v "chrY" | grep -v "NA" > sample_insulationscore.bedgraph
$ bedGraphToBigWig sample_insulationscore.bedgraph hg19.ChrSize.bed sample_insulationscore.bw
## Step3:deeptools畫(huà)圖
computeMatrix reference-point --referencePoint center --binSize 10000 \
-b 500000 -a 500000 \
-R sample_boundary.bed -S sample_insulationscore.bw \
--skipZeros -o sample_matrix_boundary_500Kb_bin10Kb.gz
plotProfile -m sample_matrix_boundary_500Kb_bin10Kb.gz \
-out sample_matrix_boundary_500Kb_bin10Kb.pdf \
--plotTitle "boundary strength"
畫(huà)圖效果如圖:
image.png
2.bedtools+R 繪圖
注
:用bedtools overlap 函數(shù)释移,取bed與bedgraph交集,最終輸出結(jié)果不是按順序輸出寥殖。如下圖所示玩讳。
image.png
- bedtools
## bedgraph區(qū)域保留NA位置
$ cat *bedGraph| grep -v "track" | sort -Vk 1 -k2,2n | grep -v "chrY" > bedtools_sample_insulationscore.bedgraph
## bed文件拓展
$
$ cat sample_boundary.bed | awk 'BEGIN{FS=OFS="\t"}{$2=$2-480000;$3=$3+480000}{print $0}' > bedtools_sample_boundary.bed
## 統(tǒng)計(jì)matrix
$ cat bedtools_sample_boundary.bed | while read id; do
echo $id| tr " " "\t" > bedtools_tmp;
intersectBed -a bedtools_tmp -b bedtools_sample_insulationscore.bedgraph -wa -wb|sort -k 6,6n | cut -f 8 | xargs| tr " " "\t" >> bedtools_matrix
done
- R 畫(huà)圖
## R 畫(huà)圖
x <- readline()
setwd(x)
library(ggplot2)
library(patchwork)
boundary_matrix <- read.table("bedtools_matrix2")
dim(boundary_matrix)
vector_test <- apply(boundary_matrix,2 ,function(x){median(x,na.rm = T)})
# plot(1:25,vector_test,type = "l")
dat=as.data.frame(vector_test)
dat$num=c(1:25)
p1<-ggplot(data = dat,aes(x=num,y=vector_test))+
#geom_line()+
geom_smooth(se = F)+
scale_x_continuous(breaks =c(1,12.5,25),
labels = c("-500Kb","0","500Kb"))+
theme_classic()+
xlab("TAD boundary")+ylab("Mean insulation score")
p1
p2 <- ggplot(data = dat,aes(x=num,y=vector_test))+geom_line()+
scale_x_continuous(breaks =c(1,12.5,25),
labels = c("-500Kb","0","500Kb"))
p1+p2
Left Fig: 顯示折線圖擬合之后的線圖;Right Fig:顯示原始折線圖
思考:
- 畫(huà)此圖需要搞清楚嚼贡,TAD boundary邊界如何取點(diǎn)
- 選取合適的工具及其參數(shù)熏纯,統(tǒng)計(jì)boundary附近的InsulationScore分布
- 可以進(jìn)一步提高計(jì)算InsulationScore的精確度,采用10Kb為binsize進(jìn)行統(tǒng)計(jì)粤策,得到的結(jié)果不需要進(jìn)行擬合直接畫(huà)圖
image.png
歡迎評(píng)論留言豆巨,交流學(xué)習(xí)??~