中國(guó)春甲基化數(shù)據(jù)
表觀遺傳學(xué)中科贬,甲基化的研究是重要的一塊研究?jī)?nèi)容。最近小麥品種中國(guó)春的參考基因組在science雜志上發(fā)表壕鹉。文章中有甲基化的數(shù)據(jù)雏赦,為了讓大家在實(shí)際研究過程中方便的使用這個(gè)數(shù)據(jù),我們特別邀請(qǐng)了中國(guó)農(nóng)大的郭偉龍團(tuán)隊(duì)進(jìn)行了數(shù)據(jù)的分析和處理趣钱,并最終呈現(xiàn)在我們小麥多組學(xué)網(wǎng)站上涌献。下面我們具體介紹下這塊內(nèi)容。
1 數(shù)據(jù)來源:
NCBI登錄號(hào) SRP133674 首有,
文章:Shifting the limits in wheat research and breeding using a fully annotated reference genome
取材時(shí)期
Cytosine methylation was profiled in DNA extracted from two-week old CS leaf tissue in three different contexts: CpG dinucleotides, CHG and CHH (where H corresponds to A, T or C).? The frozen leaves from the five samples at 3-leaf stage (Zadok stage 13) were ground and divided as input for the preparation of both RNA-seq libraries (detailed in
Chinese Spring tissues study) and whole genome bisulfite sequencing (WGBS) libraries.
2 結(jié)果描述
前面我們提到了這些數(shù)據(jù)來自science雜志上的中國(guó)春參考基因組燕垃。下面我們就總結(jié)下這篇文章中甲基化方面的結(jié)果。
Wheat DNA methylation frequency? of cytosines in the sequence contexts of CpG (average 92.7%), CHG (average 51.3%) and CHH (average 2.7%). The observed levels of cytosine methylations are among the highest observed in angiosperms (161), likely reflecting the abundance of?repetitive elements?throughout the wheat genome. Methylation patterns in wheat largely follow those observed in other species, showing enrichment in CpG and CHG sequence contexts at?pericentromeric regions(gene poor) and depletion toward the?chromosome ends?(gene rich).
首先看一看high confidence genes的甲基化pattern井联。如下圖所示卜壕,在基因編碼區(qū)相對(duì)較低,CpG和CHG而在上有啟動(dòng)子和下游則相對(duì)較高烙常。而CHH則相對(duì)較平穩(wěn)轴捎。大家分析自己的基因時(shí)可以看看是否屬于這個(gè)pattern。
high confidence genes
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (TSS = transcription start site; TTS = transcription termination site)
High rates of DNA methylation likely serve to prevent transposition by restricting the expression of transposable elements. However, where repetitive elements are proximal to gene sequences, the enriched methylation can perform a regulatory function, predominantly silencing expression. The distinct and highly conserved methylation patterns observed in regions of HC genes and their regulatory regions showed higher levels of DNA methylation associated with the 5’ regulatory regions in all contexts that diminished rapidly at the transcriptional start site (TSS).
而low confidence (LC) genes的甲基化pattern又是如何呢?如下圖轮蜕,3種類型都相對(duì)平穩(wěn)。
image-20181012160153303
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (TSS = transcription start site; TTS = transcription termination site)
DNA methylation increased in the gene body where the CpG methylation formed a peak, whereas gene body methylation levels remained at extremely low levels at CHG and CHH sites. In the 3’ regulatory region after the transcriptional termination site (TTS) methylation rapidly reverted to the levels in 5’ sequences. This contrasted with the pattern observed for LC genes, where a near uniform level of methylation was observed in all sequence contexts. As a conclusion, many of the features included in the LC annotation are either no genes, are truncated or have lost their function through mutation (i.e. pseudogenes).
有一點(diǎn)很重要蝗锥,甲基化也是一個(gè)動(dòng)態(tài)變化的過程跃洛,不同發(fā)育時(shí)期,不同環(huán)境下都會(huì)發(fā)生變化终议。有些結(jié)論要辯證的看待汇竭。
![Copia repeat elements (https://wheat-1252088472.picsh.myqcloud.com/2018-10-12-080817.png), and D) Gypsy (RLG) repeat elements.](/Users/mashengwei/Library/Application Support/typora-user-images/image-20181012160511706.png)
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TE序列相對(duì)來說甲基化程度要高很多
3 甲基化分析
農(nóng)大的郭偉龍老師開發(fā)了甲基化mapping軟件BS-Seeker2(BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data)以及后續(xù)甲基化分析軟件CGmapTools(CGmapTools improves the precision of heterozygous SNV calls and supports allele-specific methylation detection and visualization in bisulfite-sequencing data)。
具體的分析流程見這里穴张。
需要注意的地方:
1细燎、單條染色體需要拆分成兩部分,即使用官方提供的161010_Chinese_Spring_v1.0_pseudomolecules_parts.fasta進(jìn)行基因組index
2皂甘、使用bs_seeker2-call_methylation.py時(shí)不要整個(gè)基因組一起call methylation玻驻,一來速度太慢,二來整個(gè)基因組一起會(huì)出現(xiàn)bug(其他人有沒有還不清楚)偿枕。我簡(jiǎn)單的說下我的測(cè)試過程璧瞬,整個(gè)基因組進(jìn)行call methylation,根據(jù)程序提示如果1A部分已經(jīng)運(yùn)行完畢渐夸,直接停止嗤锉;分離出1A的bam文件單獨(dú)對(duì)1A進(jìn)行call methylation;將1A和2A合并到一起call methylation墓塌。最后發(fā)現(xiàn)瘟忱,整個(gè)基因組call methylation的結(jié)果與其它兩個(gè)均不同;而無論是1A單獨(dú)還是1A和2A一起call methylation苫幢,結(jié)果都是相同的访诱。
4 Jbrowse呈現(xiàn)
目前可以在我們網(wǎng)站(http://202.194.139.32)上查詢感興趣基因的甲基化水平。
下面我們看一個(gè)例子盐数。GS5基因在水稻中控制水稻的粒形和粒重,在小麥里中GS5(TraesCS3A02G212900LC, TraesCS3B02G277100LC和TraesCS3D02G172900)也已經(jīng)被多個(gè)課題組同源克隆伞梯,其中3B基因有兩處大插入玫氢,破壞了基因結(jié)構(gòu)。從甲基化水平上來看谜诫,兩處插入序列的甲基化水平較高(如下圖)漾峡。
最后再?gòu)?qiáng)調(diào)一點(diǎn),這里的甲基化是苗期葉片中的喻旷,不代表其他組織中的甲基化水平一定也是這樣生逸。