前言
銜接上一篇數(shù)據(jù)比對(duì)后的結(jié)果扛拨,使用R包DSS進(jìn)行處理。
Input文件準(zhǔn)備
我們先來(lái)復(fù)習(xí)一下上一節(jié)課得到的數(shù)據(jù)結(jié)果:
*.bismark.cov.gz
文件
$head test_data_bismark_bt2.deduplicated.bismark.cov
1 975476 975476 100 1 0
1 975488 975488 100 1 0
1 975490 975490 100 1 0
1 2224487 2224487 100 1 0
1 2224489 2224489 100 1 0
1 2224514 2224514 100 1 0
1 2224520 2224520 100 1 0
1 2313220 2313220 0 0 1
1 9313902 9313902 100 1 0
1 9313914 9313914 100 1 0
# 第一列代表chromosome
# 第二枣宫,三列代表location
# 第四列代表甲基化百分比
# 第五列代表甲基化數(shù)目
# 第六列代表未甲基化數(shù)目
這里我們使用的R包為DSS,使用Bioconductor進(jìn)行安裝。
這個(gè)包可以對(duì)甲基化數(shù)據(jù)做兩件事:
- 找出差異甲基化位點(diǎn)
- 根據(jù)甲基化位點(diǎn)找出差異化甲基化區(qū)域
DSS包對(duì)差異甲基化的檢測(cè)基于β負(fù)二項(xiàng)分布的嚴(yán)格沃爾德檢驗(yàn)此洲。
DSS包可以對(duì)無(wú)重復(fù)數(shù)據(jù)進(jìn)行處理
輸入數(shù)據(jù)的格式如下:
- 第一列為染色體
- 第二列為位置
- 第三列為total reads
- 第四列為甲基化的reads
chr pos N X
chr18 3014904 26 2
chr18 3031032 33 12
chr18 3031044 33 13
chr18 3031065 48 24
# 根據(jù)這個(gè)特性翔悠,我們可以將上面的輸出文件通過(guò)Linux或者R進(jìn)行簡(jiǎn)單的處理得到input文件业崖。
# 非常簡(jiǎn)單,所以這里請(qǐng)自行轉(zhuǎn)化蓄愁,凡事都需要自己摸索腻要。
Call DML or DMR
DML是甲基化差異位點(diǎn),DMR為甲基化差異區(qū)域涝登。
使用DSS包自帶的數(shù)據(jù)進(jìn)行演示
1. 載入兩個(gè)包DSS和bsseq(需要該包構(gòu)建obj對(duì)象)
library(DSS)
require(bsseq)
path <- file.path(system.file(package="DSS"), "extdata")
dat1.1 <- read.table(file.path(path, "cond1_1.txt"), header=TRUE)
dat1.2 <- read.table(file.path(path, "cond1_2.txt"), header=TRUE)
dat2.1 <- read.table(file.path(path, "cond2_1.txt"), header=TRUE)
dat2.2 <- read.table(file.path(path, "cond2_2.txt"), header=TRUE)
BSobj <- makeBSseqData( list(dat1.1, dat1.2, dat2.1, dat2.2),
c("C1","C2", "N1", "N2") )[1:1000,]
# 查看BSobj的類型
BSobj
An object of type 'BSseq' with
1000 methylation loci
4 samples
has not been smoothed
All assays are in-memory雄家、
2. 利用DMLtest函數(shù)call DML,分為一下幾個(gè)步驟:
- 預(yù)測(cè)所有CpG位點(diǎn)的平均甲基化水平
- 對(duì)每個(gè)CpG位點(diǎn)估算其甲基化水平的dispersions
- 進(jìn)行沃爾德檢驗(yàn)
在第一步過(guò)程中,smoothing可以更好幫助估算甲基化(針對(duì)whole-genome BS-seq)趟济。
而RRBS不需要smoothing乱投。
根據(jù)甲基化水平進(jìn)行l(wèi)oci的差異分析
dmlTest <- DMLtest(BSobj, group1=c("C1", "C2"), group2=c("N1", "N2"))
head(dmlTest)
chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr
1 chr18 3014904 0.3817233 0.4624549 -0.08073162 0.24997034 -0.3229648 0.300542998 0.01706260 0.74672190 0.9985094
2 chr18 3031032 0.3380579 0.1417008 0.19635711 0.11086362 1.7711592 0.008911745 0.04783892 0.07653423 0.6792127
3 chr18 3031044 0.3432172 0.3298853 0.01333190 0.12203116 0.1092500 0.010409029 0.01994821 0.91300423 0.9985094
4 chr18 3031065 0.4369377 0.3649218 0.07201587 0.10099395 0.7130711 0.010320888 0.01603200 0.47580174 0.9985094
5 chr18 3031069 0.2933572 0.5387464 -0.24538920 0.13178800 -1.8619996 0.012537553 0.02320887 0.06260315 0.6158797
6 chr18 3031082 0.3526311 0.3905718 -0.03794068 0.07847999 -0.4834440 0.007665696 0.01145531 0.62878051 0.9985094
dmlTest.sm <- DMLtest(BSobj, group1=c("C1", "C2"), group2=c("N1", "N2"), smoothing=TRUE)
head(dmlTest.sm)
chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr
1 chr18 3014904 0.3693669 0.4566563 -0.08728939 0.29967322 -0.2912819 0.30054300 0.01706260 0.7708357 0.9656515
2 chr18 3031032 0.3433882 0.3679732 -0.02458503 0.03970109 -0.6192533 0.03177894 0.28323422 0.5357495 0.8639036
3 chr18 3031044 0.3412867 0.3678807 -0.02659404 0.04032823 -0.6594397 0.02536938 0.02080295 0.5096134 0.8596522
4 chr18 3031065 0.3358830 0.3511983 -0.01531533 0.04799161 -0.3191252 0.01123412 0.01621926 0.7496316 0.9652417
5 chr18 3031069 0.3358830 0.3511983 -0.01531533 0.03205500 -0.4777830 0.02832889 0.05857316 0.6328047 0.8968029
6 chr18 3031082 0.3358830 0.3511983 -0.01531533 0.05846593 -0.2619531 0.01682981 0.01368466 0.7933576 0.9745116
3. 根據(jù)dmlTest來(lái)call DML
dmls <- callDML(dmlTest.sm, p.threshold=0.001)
dmls
chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr postprob.overThreshold
447 chr18 3973699 0.8530694 0.3432547 0.5098147 0.05726793 8.902272 0.03662109 0.98947516 5.471481e-19 4.962634e-16 1.0000000
709 chr18 4564190 0.7773858 0.1977036 0.5796822 0.10294267 5.631116 0.21725415 0.02952656 1.790467e-08 5.413180e-06 0.9999984
710 chr18 4564237 0.7773858 0.1977036 0.5796822 0.15431085 3.756587 0.02931516 0.26238558 1.722462e-04 7.439396e-03 0.9990652
dmls <- callDML(dmlTest, p.threshold=0.001)
dmls
chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr postprob.overThreshold
450 chr18 3976129 0.01027497 0.939033927 -0.9287590 0.06544340 -14.191789 0.052591567 0.02428826 1.029974e-45 2.499403e-43 1.0000000
451 chr18 3976138 0.01027497 0.939033927 -0.9287590 0.06544340 -14.191789 0.052591567 0.02428826 1.029974e-45 2.499403e-43 1.0000000
638 chr18 4431501 0.01331553 0.943056638 -0.9297411 0.09273779 -10.025483 0.053172411 0.07746835 1.177826e-23 1.429096e-21 1.0000000
639 chr18 4431511 0.01327049 0.943056638 -0.9297862 0.09270080 -10.029969 0.053121697 0.07746835 1.125518e-23 1.429096e-21 1.0000000
710 chr18 4564237 0.91454619 0.011930005 0.9026162 0.05260037 17.159883 0.009528898 0.04942849 5.302004e-66 3.859859e-63 1.0000000
782 chr18 4657576 0.98257334 0.067835497 0.9147378 0.06815000 13.422418 0.010424723 0.06755651 4.468885e-41 8.133371e-39 1.0000000
582 chr18 4340682 0.95398081 0.030390730 0.9235901 0.10935874 8.445508 0.085494283 0.04540643 3.027264e-17 2.754810e-15 1.0000000
583 chr18 4340709 0.95398081 0.030390730 0.9235901 0.10935874 8.445508 0.085494283 0.04540643 3.027264e-17 2.754810e-15 1.0000000
340 chr18 3542732 0.95023554 0.034383112 0.9158524 0.11937407 7.672122 0.089137013 0.04474741 1.691739e-14 1.368429e-12 1.0000000
395 chr18 3723448 0.06570765 0.751990744 -0.6862831 0.09825286 -6.984866 0.011958092 0.01646418 2.851278e-12 2.075730e-10 1.0000000
188 chr18 3370113 0.01488553 0.787980174 -0.7730946 0.11380172 -6.793347 0.054190769 0.02752024 1.095611e-11 7.250956e-10 1.0000000
400 chr18 3785543 0.79337804 0.118353679 0.6750244 0.12183251 5.540593 0.017720841 0.02442007 3.014493e-08 1.688116e-06 0.9999988
683 chr18 4494490 0.77615275 0.104235359 0.6719174 0.12407577 5.415380 0.009219023 0.08210742 6.115879e-08 3.180257e-06 0.9999980
783 chr18 4657592 0.96858371 0.266894590 0.7016891 0.13077255 5.365722 0.064228633 0.07721117 8.062618e-08 3.668491e-06 0.9999979
642 chr18 4431618 0.43034706 0.965539700 -0.5351926 0.09578178 -5.587625 0.012064922 0.07497569 2.301966e-08 1.396526e-06 0.9999972
189 chr18 3370141 0.01488553 0.676293642 -0.6614081 0.12554112 -5.268458 0.054190769 0.02138845 1.375746e-07 5.891428e-06 0.9999961
738 chr18 4635185 0.44671015 0.007342128 0.4393680 0.08148745 5.391849 0.010069744 0.05154034 6.973627e-08 3.384534e-06 0.9999844
330 chr18 3542403 0.02875655 0.714168320 -0.6854118 0.15294560 -4.481409 0.041892077 0.03368033 7.415192e-06 2.841190e-04 0.9999354
185 chr18 3347936 0.49560648 0.017291187 0.4783153 0.10184167 4.696656 0.012173893 0.04533502 2.644552e-06 1.069574e-04 0.9998983
92 chr18 3217241 0.03876653 0.767643956 -0.7288774 0.17547605 -4.153714 0.038095316 0.03753383 3.271214e-05 9.602621e-04 0.9998319
396 chr18 3723468 0.23043341 0.951779714 -0.7213463 0.18220738 -3.958930 0.173291593 0.07965750 7.528626e-05 1.838513e-03 0.9996786
40 chr18 3047980 0.07206183 0.658535570 -0.5864737 0.14396467 -4.073734 0.012517547 0.02615351 4.626536e-05 1.247451e-03 0.9996373
614 chr18 4353399 0.01355584 0.547298573 -0.5337427 0.12855458 -4.151876 0.053422013 0.02055735 3.297603e-05 9.602621e-04 0.9996300
99 chr18 3226251 0.97443783 0.217343279 0.7570945 0.19807591 3.822244 0.012911947 0.27481269 1.322425e-04 3.105566e-03 0.9995532
613 chr18 4353379 0.01355584 0.529341634 -0.5157858 0.13033377 -3.957422 0.053422013 0.02080043 7.576289e-05 1.838513e-03 0.9992902
894 chr18 4921874 0.35815864 0.009172558 0.3489861 0.07955244 4.386868 0.009774839 0.05010151 1.149944e-05 3.986472e-04 0.9991255
895 chr18 4921881 0.35815864 0.009172558 0.3489861 0.07955244 4.386868 0.009774839 0.05010151 1.149944e-05 3.986472e-04 0.9991255
注意,這里使用smoothing進(jìn)行操作顷编,事實(shí)上使用的示例數(shù)據(jù)我也不清楚是RRBS還是WGBS戚炫,但是使用dmlTest call DML會(huì)有更多的結(jié)果。
當(dāng)然媳纬,用戶也可以指定差異的閾值双肤,只有差異大于閾值的才會(huì)被call出來(lái)。
dmls2 <- callDML(dmlTest, delta=0.1, p.threshold=0.001)
head(dmls2)
chr pos mu1 mu2 diff diff.se stat phi1 phi2 pval fdr postprob.overThreshold
450 chr18 3976129 0.01027497 0.9390339 -0.9287590 0.06544340 -14.19179 0.052591567 0.02428826 1.029974e-45 2.499403e-43 1
451 chr18 3976138 0.01027497 0.9390339 -0.9287590 0.06544340 -14.19179 0.052591567 0.02428826 1.029974e-45 2.499403e-43 1
638 chr18 4431501 0.01331553 0.9430566 -0.9297411 0.09273779 -10.02548 0.053172411 0.07746835 1.177826e-23 1.429096e-21 1
639 chr18 4431511 0.01327049 0.9430566 -0.9297862 0.09270080 -10.02997 0.053121697 0.07746835 1.125518e-23 1.429096e-21 1
710 chr18 4564237 0.91454619 0.0119300 0.9026162 0.05260037 17.15988 0.009528898 0.04942849 5.302004e-66 3.859859e-63 1
782 chr18 4657576 0.98257334 0.0678355 0.9147378 0.06815000 13.42242 0.010424723 0.06755651 4.468885e-41 8.133371e-39 1
4. 根據(jù)dmlTest Call DMR
dmrs <- callDMR(dmlTest.sm, p.threshold=0.01)
dmrs
chr start end length nCG meanMethy1 meanMethy2 diff.Methy areaStat
23 chr18 4921637 4922059 423 27 0.11002940 0.01809674 0.09193266 86.29588
7 chr18 3507919 3508022 104 10 0.07524915 0.03294316 0.04230600 30.55943
15 chr18 4340682 4340753 72 4 0.89237955 0.35052968 0.54184987 10.83526
dmrs <- callDMR(dmlTest, p.threshold=0.01)
dmrs
chr start end length nCG meanMethy1 meanMethy2 diff.Methy areaStat
27 chr18 4657576 4657639 64 4 0.506453 0.318348 0.188105 14.34236
我們可以發(fā)現(xiàn)钮惠,smoothing前后得到的結(jié)果差異還是很大茅糜,所以針對(duì)不同的實(shí)驗(yàn)類型我們需要注意是否使用smoothing。
同理素挽,也可以使用的delta參數(shù)以及調(diào)整p.threshold得到合適的結(jié)果蔑赘。
dmrs2 <- callDMR(dmlTest, delta=0.1, p.threshold=0.05)
dmrs2
chr start end length nCG meanMethy1 meanMethy2 diff.Methy areaStat
31 chr18 4657576 4657639 64 4 0.5064530 0.3183480 0.188105 14.34236
19 chr18 4222533 4222608 76 4 0.7880276 0.3614195 0.426608 12.91667
5. 可視化
DSS包提供了一個(gè)不是很美觀的可視化函數(shù),用戶其實(shí)可以使用coverage結(jié)果在R里面作圖预明。
showOneDMR(dmrs[1,], BSobj)
結(jié)語(yǔ)
分析到此就告一段落了缩赛,隨后就是自行對(duì)差異甲基化區(qū)域的注釋以及可視化文獻(xiàn)作圖。
Q&A
后續(xù)在處理數(shù)據(jù)的時(shí)候撰糠,發(fā)現(xiàn)有一個(gè)情況酥馍,多核跑DMLtest會(huì)報(bào)錯(cuò),詳情解決方案可見https://github.com/haowulab/DSS/issues/13
以下為簡(jiǎn)單解決方案
single = MulticoreParam(workers=1, progressbar=TRUE)
#dmlTest.sm_1 <- DMLtest(BSobj, group1="mdf", group2="mp", equal.disp = T)
dmlTest.sm_1 <- DMLtest(BSobj, group1="mdf", group2="mp", smoothing=TRUE, BPPARAM=single)