methylkit差異甲基化分析

文件格式

可以是最基本的甲基化調(diào)用文件也可以是bismark下游的BAM/SAM文件。

在這里使用的是經(jīng)過處理得到的文本文件铡原。

典型的文件應(yīng)該是這樣的

##         chrBase   chr    base strand coverage freqC  freqT
## 1 chr21.9764539 chr21 9764539      R       12 25.00  75.00
## 2 chr21.9764513 chr21 9764513      R       12  0.00 100.00
## 3 chr21.9820622 chr21 9820622      F       13  0.00 100.00
## 4 chr21.9837545 chr21 9837545      F       11  0.00 100.00
## 5 chr21.9849022 chr21 9849022      F      124 72.58  27.42

導(dǎo)入文件

library(methylKit)
file.list <- list(
  "brain_1.txt",
  "brain_2.txt",
  "adrenal_1.txt",
 "adrenal_2.txt")

myobj <- methRead(
  file.list,
  sample.id=list(
    "brain_1",
    "brain_2",
    "adrenal_1",
    "adrenal_2"
    ),
  assembly="hg19",
  sep = " ",
  treatment=c(1,1,0,0),
  context="CpG",
  mincov = 10
)

1.樣本描述性統(tǒng)計(jì)

1.1甲基化百分比的分布直方圖

甲基化百分比

一般呈現(xiàn)兩邊高中間低的特點(diǎn)。要么高甲基化瘩燥，要么低甲基化奸例。

1.2甲基化覆蓋率的直方圖

甲基化覆蓋率

如果數(shù)據(jù)受到了PCR擴(kuò)增的影響供填，那么右邊還會(huì)出現(xiàn)一個(gè)峰渣磷。

1.3根據(jù)覆蓋率對(duì)位點(diǎn)進(jìn)行篩選婿着，去除過高或者過低的位點(diǎn)

2比較分析

2.1對(duì)樣本進(jìn)行merge

meth=unite(tile, destrand=FALSE)
head(tile)

得到在所有樣本中都有cover的甲基化位點(diǎn)or甲基化區(qū)域（這里是甲基化區(qū)域）。

methylBase object with 6 rows
--------------
   chr  start    end strand coverage1 numCs1 numTs1 coverage2 numCs2 numTs2 coverage3 numCs3 numTs3 coverage4 numCs4
1 chr1 714501 715000      *       218      3    215       204      6    198       176      5    171       168      3
2 chr1 799001 799500      *        84     78      6       245    217     28        98     68     30       144    106
3 chr1 805001 805500      *       573     26    547      1005     22    983       823     16    807      1219     28
4 chr1 809001 809500      *        76     76      0        56     54      2        46     40      6        77     68
5 chr1 839001 839500      *       132     34     98       161     52    109       157    103     54       254    170
6 chr1 839501 840000      *       281     43    238       374     56    318       327     39    288       508     73
  numTs4
1    165
2     38
3   1191
4      9
5     84
6    435
--------------
sample.ids: brain_1 brain_2 adrenal_1 adrenal_2 
destranded FALSE 
assembly: hg19 
context: CpG 
treament: 1 1 0 0 
resolution: region

2.2樣本相關(guān)性分析

得到相關(guān)系數(shù)圖

樣本相關(guān)系數(shù)圖

2.3樣本聚類信息

樣本聚類圖

2.4 PCA分析（是什么醋界？）

We can also do a PCA analysis on our samples. The following function will plot a scree plot for importance of components.

碎石圖

We can also plot PC1 and PC2 axis and a scatter plot of our samples on those axis which will reveal how they cluster.

image.png

2.5找到差異甲基化區(qū)域

The calculateDiffMeth() function is the main function to calculate differential methylation. Depending on the sample size per each set it will either use Fisher’s exact or logistic regression to calculate P-values. P-values will be adjusted to Q-values using SLIM method (Wang, Tuominen, and Tsai 2011). If you have replicates, the function will automatically use logistic regression.
calculateDiffMeth()函數(shù)是計(jì)算差異甲基化的主要函數(shù)竟宋。根據(jù)每個(gè)組的樣本多少，它將使用Fisher精確檢驗(yàn)或邏輯回歸檢驗(yàn)來計(jì)算p值物独。使用SLIM方法將p值調(diào)整為q值(Wang, Tuominen, and Tsai 2011)袜硫。如果有重復(fù)樣本氯葬，函數(shù)將自動(dòng)使用邏輯回歸檢驗(yàn)挡篓。

原理

這個(gè)原理我也不是很懂。

myDiff=calculateDiffMeth(meth)
# get hyper methylated bases
myDiff25p.hyper=getMethylDiff(myDiff,difference=25,qvalue=0.01,type="hyper")
# get hypo methylated bases
myDiff25p.hypo=getMethylDiff(myDiff,difference=25,qvalue=0.01,type="hypo")
# get all differentially methylated bases
myDiff25p=getMethylDiff(myDiff,difference=25,qvalue=0.01)

可視化每條染色體的低甲基化/高甲基化堿基/區(qū)域的分布：

似乎報(bào)錯(cuò)了

為什么只有奇數(shù)的染色體呢

3.注釋差異甲基化位點(diǎn)或者區(qū)域

得到差異甲基化的基因組分布

Summary of target set annotation with genic parts
Rows in target set: 12066
-----------------------
percentage of target features overlapping with annotation:
  promoter       exon     intron intergenic 
     14.63      26.44      60.27      31.86 

percentage of target features overlapping with annotation:
(with promoter > exon > intron precedence):
  promoter       exon     intron intergenic 
     14.63      16.91      36.61      31.86 

percentage of annotation boundaries with feature overlap:
promoter     exon   intron 
    3.91     1.36     3.16 

summary of distances to the nearest TSS:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0    2890   11562   36476   36108 1718029

得到差異甲基化位點(diǎn)的CpG島分布：

summary of target set annotation with feature annotation:
Rows in target set: 12066
----------------------------
percentage of target elements overlapping with features:
  CpGi shores  other 
 66.34   7.09  29.15 

percentage of feature elements overlapping with target:
  CpGi shores 
 15.25   1.96

得到距離TSS最近的基因名稱

 target.row dist.to.feature feature.name feature.strand
57             1           -7315   uc001abu.1              +
92             2            3201   uc031pkn.1              +
64             3           -2155   uc031pko.1              +
64.1           4           -1655   uc031pko.1              +
106            5            1126   uc001ace.3              +
106.1          6            1626   uc001ace.3              +

讀取CpG島（CpGi）和CpG海岸注釋時(shí)報(bào)錯(cuò)了帚称，不過好像沒有什么關(guān)系官研，幫助文檔里也這么用。

...被棄用

a GenomicRangesList contatining one GRanges object for flanks and one for GRanges object for the main feature.
NOTE: This can not return a GRangesList at the moment because flanking regions do not have to have the same column name as the feature. GRangesList elements should resemble each other in the column content. We can not satisfy that criteria for the flanks

與內(nèi)含子/外顯子/啟動(dòng)子/CpG島等重疊的差異甲基化區(qū)域的百分比/數(shù)目

image.png

讀入甲基化調(diào)用文件之后會(huì)返回一個(gè)methylRawList對(duì)象闯睹。

但是當(dāng)樣本很多而且樣本的規(guī)格很大的時(shí)候戏羽，可以返回不一樣的對(duì)象
methylRawListDB, methylRawDB, methylBaseDB and methylDiffDB統(tǒng)稱為methylDB objects。

Sometimes, when dealing with multiple samples and increased sample sizes coming from genome wide bisulfite sequencing experiments, the memory of your computer might not be sufficient enough.

Therefore methylKit offers a new group of classes, that are basically pendants to the original methylKit classes with one important difference: The methylation information, which normally is internally stored as data.frame, is stored in an external bgzipped file and is indexed by tabix (Li 2011), to enable fast retrieval of records or regions. This group contains methylRawListDB, methylRawDB, methylBaseDB and methylDiffDB, let us call them methylDB objects.

We can now create a methylRawListDB object, which stores the same content as myobj from above. But the single methylRaw objects retrieve their data from the tabix-file linked under dbpath.

最后編輯于：2020.11.27 10:14:44

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末楼吃，一起剝皮案震驚了整個(gè)濱河市始花，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌孩锡，老刑警劉巖酷宵，帶你破解...
沈念sama閱讀 219,427評(píng)論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異躬窜，居然都是意外死亡浇垦，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,551評(píng)論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門荣挨，熙熙樓的掌柜王于貴愁眉苦臉地迎上來男韧，“玉大人，你說我怎么就攤上這事默垄〈寺牵” “怎么了？”我有些...
開封第一講書人閱讀 165,747評(píng)論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵口锭，是天一觀的道長寡壮。經(jīng)常有香客問我，道長，這世上最難降的妖魔是什么况既？我笑而不...
開封第一講書人閱讀 58,939評(píng)論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任这溅，我火速辦了婚禮，結(jié)果婚禮上棒仍，老公的妹妹穿的比我還像新娘悲靴。我一直安慰自己，他們只是感情好莫其，可當(dāng)我...
茶點(diǎn)故事閱讀 67,955評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布癞尚。她就那樣靜靜地躺著，像睡著了一般乱陡。火紅的嫁衣襯著肌膚如雪浇揩。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,737評(píng)論 1贊 305
城市分裂傳說
那天憨颠，我揣著相機(jī)與錄音胳徽，去河邊找鬼。笑死爽彤，一個(gè)胖子當(dāng)著我的面吹牛养盗，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播适篙，決...
沈念sama閱讀 40,448評(píng)論 3贊 420
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼往核，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了嚷节？” 一聲冷哼從身側(cè)響起聂儒，我...
開封第一講書人閱讀 39,352評(píng)論 0贊 276
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎硫痰，沒想到半個(gè)月后衩婚，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,834評(píng)論 1贊 317
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡碍论，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,992評(píng)論 3贊 338
?白月光啟示錄
正文我和宋清朗相戀三年谅猾，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片鳍悠。...
茶點(diǎn)故事閱讀 40,133評(píng)論 1贊 351
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡税娜，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出藏研，到底是詐尸還是另有隱情敬矩，我是刑警寧澤，帶...
沈念sama閱讀 35,815評(píng)論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布蠢挡，位于F島的核電站弧岳，受9級(jí)特大地震影響凳忙，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜禽炬，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,477評(píng)論 3贊 331
男人毒藥：我在死后第九天來索命
文/蒙蒙一涧卵、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧腹尖，春花似錦柳恐、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,022評(píng)論 0贊 22
一樁弒父案乐设，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至绎巨，卻和暖如春近尚，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背场勤。一陣腳步聲響...
開封第一講書人閱讀 33,147評(píng)論 1贊 272
情欲美人皮
我被黑心中介騙來泰國打工戈锻，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人却嗡。一個(gè)月前我還...
沈念sama閱讀 48,398評(píng)論 3贊 373
代替公主和親
正文我出身青樓舶沛，卻偏偏與公主長得像嘹承，于是被迫代替她去往敵國和親窗价。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,077評(píng)論 2贊 355

methylkit差異甲基化分析

文件格式

可以是最基本的甲基化調(diào)用文件也可以是bismark下游的BAM/SAM文件。

在這里使用的是經(jīng)過處理得到的文本文件铡原。

導(dǎo)入文件

1.樣本描述性統(tǒng)計(jì)

1.1甲基化百分比的分布直方圖

1.2甲基化覆蓋率的直方圖

1.3根據(jù)覆蓋率對(duì)位點(diǎn)進(jìn)行篩選婿着，去除過高或者過低的位點(diǎn)

2比較分析

2.1對(duì)樣本進(jìn)行merge

2.2樣本相關(guān)性分析

2.3樣本聚類信息

2.4 PCA分析（是什么醋界？）

2.5找到差異甲基化區(qū)域

3.注釋差異甲基化位點(diǎn)或者區(qū)域

得到差異甲基化的基因組分布

得到差異甲基化位點(diǎn)的CpG島分布：

得到距離TSS最近的基因名稱

與內(nèi)含子/外顯子/啟動(dòng)子/CpG島等重疊的差異甲基化區(qū)域的百分比/數(shù)目

推薦閱讀更多精彩內(nèi)容