Roadmap paper解讀

Integrative analysis of 111 reference human epigenomes

因?yàn)槲业拿枋龆际腔谖易约旱睦斫饷泄矗灰欢ㄕ_濒蒋，所以我的習(xí)慣是解析的時(shí)候?qū)⒃牡膬?nèi)容也拷下來(lái)蓝翰，雖然增加了篇幅，但有利于理解偶宫。頭一次在簡(jiǎn)書(shū)上寫(xiě)文章谐檀，若有錯(cuò)誤還請(qǐng)諒解。

Roadmap計(jì)劃重要的paper:https://www.nature.com/articles/nature14248#Sec41

Computational tools and methods:https://www.nature.com/articles/nature14316

Intro

We integrate information about histone marks, DNA methylation, DNA accessibility and RNA expression to infer high-resolution maps of regulatory elements annotated jointly across a total of 127 reference epigenomes spanning diverse cell and tissue types.

In addition,we study the role of regulatory regions in human disease by relating our epigenomic annotations to genetic variants associated with common traits and disorders.

Specific highlights of our findings are given below.

Histone mark combinations show distinct levels ofDNA methylation and accessibility, and predict differences in RNA expression levels that are not reflected in either accessibility or methylation.
Megabase-scale regions with distinct epigenomic signatures show strong differences in activity, gene density and nuclear lamina associations, suggesting distinct chromosomal domains.
Approximately 5% ofeach reference epigenome shows enhancer and promoter signatures, which are twofold enriched for evolutionarily conserved non-exonic elements on average.
Epigenomic data sets can be imputed at high resolution from existing data, completing missing marks in additional cell types, and providing a more robust signal even for observed data sets.
Dynamics of epigenomic marks in their relevant chromatin states allow a data-driven approach to learn biologically meaningful relationships between cell types, tissues and lineages.
Enhancers with coordinated activity patterns across tissues are enriched for common gene functions and human phenotypes, suggesting that they represent coordinately regulated modules.
Regulatorymotifs are enriched in tissue-specific enhancers, enhancer modules and DNA accessibility footprints, providing an important resource for gene-regulatory studies.
Genetic variants associated with diverse traits show epigenomic enrichments in trait-relevant tissues, providing an important resource for understanding the molecular basis of human disease.

Reference epigenome mapping across tissues and cell types

The REMCs generated a total of 2,805 genome-wide data sets, including 1,821 histone modification data sets, 360DNA accessibility data sets,
277 DNA methylation data sets, and 166 RNA-seq data sets, encompassing a total of150.21 billion mapped sequencing reads corresponding to 3,174-fold coverage of the human genome.

Here, we focus on a subset of 1,936 data sets comprising 111 reference epigenomes, which we define as having a core set of five histone modification marks

The five marks consist of: histone H3 lysine 4 trimethylation (H3K4me3), associated with promoter regions, H3K4me1 associated with enhancer regions ;H3 lysine36 trimethylation (H3K36me3), associated with transcribed regions; H3 lysine 27 trimethylation (H3K27me3), associated with Polycomb repression ; and H3 lysine 9 trimethylation(H3K9me3), associated with heterochromatin regions

Selected epigenomes also contain a subset of additional epigenomic marks, including: acetylation marks H3K27acand H3K9ac, associated with increasedactivationofenhancer andpromoter regions2

We computed several quality control measures介紹數(shù)據(jù)集的質(zhì)控指標(biāo)：

the number of distinct uniquely mapped reads; 唯一比對(duì)的reads總數(shù)
the fraction of mapped reads overlapping areas of enrichment等脂；經(jīng)典的FRiP俏蛮，富集在peaks的reads比例
genome-wide strand cross-correlation 交叉相關(guān)性質(zhì)量評(píng)估度量值，相關(guān)概念見(jiàn)：https://www.baidu.com/link?url=xS6iz8-hJ0J6sNoxpxVgRsPeU52ZMXCGanmaWmyjM04Kw1xxiwqiAhpIwvNJS910&wd=&eqid=c0e6730200024bea000000065e7dd6f5NSC值越大表明富集效果越好慎菲，NSC值低于1.1
表明較弱的富集嫁蛇，小于1表示無(wú)富集。NSC值稍微低于1.05露该，有較低的信噪比或很少的峰，這肯能是生物學(xué)真實(shí)現(xiàn)象第煮，比如有的因子在特定組織類(lèi)型中只有很少的結(jié)合位點(diǎn)解幼；也可能確實(shí)是數(shù)據(jù)質(zhì)量差。
inter-replicate correlation; 重復(fù)間相關(guān)性
multidimensional scaling of data sets from different production centres 不同機(jī)構(gòu)產(chǎn)出數(shù)據(jù)的歸一化
correlation across pairs of data sets 不同數(shù)據(jù)集的相關(guān)性
consistency between assays carried out in multiple mapping centres
read mapping quality for bisulfite-treated reads
agreement with imputed data

Outlier data sets were flagged, removed or replaced, and lower-coverage data sets were combined where possible (see Methods).

Roadmap_ref_ChromStates.jpg

Chromatin states,DNAmethylation and DNAaccessibility

15-state model

As a foundation for integrative analysis, we used a common set of com-
binatorial chromatin states across all 111 epigenomes, plus 16 additional epigenomes generated by the ENCODEproject (127 epigenomes in total), using the core set of five histone modification marks that were common to all.

We trained a 15-state model consisting of 8 active states and 7 repressed states that were recurrently recovered and showed distinct levels of DNA methylation包警、DNA accessibility 撵摆、regulator binding and evolutionary conservation

作者對(duì)127個(gè)epigenomes進(jìn)行chromStates建模，調(diào)參（比如shift-size害晦，具體見(jiàn)methods）特铝，用60個(gè)高質(zhì)量的epigenomes數(shù)據(jù)集作為訓(xùn)練集，構(gòu)建15-state model壹瘟，并應(yīng)用到剩下的數(shù)據(jù)集中(還有expanded 18-state model)

multiple-statesDynamic.jpg

關(guān)于states的詳細(xì)描述：

15-states_model.jpg

增強(qiáng)子和啟動(dòng)子區(qū)在進(jìn)化保守非外顯子區(qū)呈現(xiàn)富集趨勢(shì)鲫剿，上圖的f

Enhancer and promoter states covered approximately 5% of each reference epigenome on average, and showed enrichment for evolutionarily conserved non-exonic regions

Evolutionary conservation analysis of chromatin states in each cell type for conserved elements (GERP), using all conserved elements (a,b), or only non-exonic conserved elements (c,d) for both the 15-state model (a,c) and the 18-state model (b,d)

GERP_overlap.jpg

關(guān)于15-state model的穩(wěn)健性：

model_robust.jpg

之前的15-state模型是把111個(gè)參考表觀(guān)組聯(lián)合起來(lái)構(gòu)建的，為了評(píng)估這個(gè)模型的穩(wěn)健性稻轨，這里作者對(duì)111個(gè)參考轉(zhuǎn)錄組分別應(yīng)用ChromHMM構(gòu)建15-state灵莲，然后把得到的1,680-state
emission probability vectors（估計(jì)是111*15+15）進(jìn)行聚類(lèi)，發(fā)現(xiàn)分別對(duì)數(shù)據(jù)集建模得到的聚類(lèi)結(jié)果非常好（仍然是主要的15個(gè)state）殴俱，且同一個(gè)state的數(shù)據(jù)集間有一定variation政冻。具體的可以參考method:

The trained model was then used to compute the posterior probability of each state for each genomic bin in each reference epigenome. The regions were labelled using the state with the maximum posterior probability.
(最大后驗(yàn)概率，意思是構(gòu)建一個(gè)似然函數(shù)线欲，參數(shù)是state的類(lèi)別（15個(gè)）明场，選擇一個(gè)類(lèi)別使得在給定樣本（這里是基因組區(qū)間bin）的條件下似然函數(shù)值最大，這個(gè)state參數(shù)就是預(yù)測(cè)的bin的state

且有兩個(gè)新的clusters:

This analysis revealed two new clusters (red crosses) which are not represented in the 15 states of the jointly learned model: ‘HetWk’, a cluster showing weak enrichment for H3K9me3; and ‘Rpts’, a cluster showing H3K9me3 along with a diversity of other marks, and enriched in specific types of repetitive

Relationship between different modalities

We used chromatin states to study the relationship between histone modification patterns, RNA expression levels, DNA methylation and DNA accessibility.

we found low DNA methylation and high accessibility in promoter states, high DNAmethylation and low accessibility in transcribed states, and intermediate DNAmethylation and accessibility in enhancer states

relationship.jpg

可以看出李丰，對(duì)于高表達(dá)的基因苦锨，DNA甲基化的差異更顯著(c)，且高表達(dá)基因更多地位于strong enhancers附近(H3K27ac+H3K4me1)

Chromatin states sometimes captured differences in RNA expression that are missed by DNA methylation or accessibility. For example, TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions of DNA accessibility but widely differing enrichments for expressed genes染色質(zhì)狀態(tài)有時(shí)可以捕獲更細(xì)致的那些會(huì)被DNA甲基化或可及性忽略的RNA差異表達(dá)信息；又或者兩個(gè)state可能甲基化水平相當(dāng)?shù)杉靶院筒町惐磉_(dá)水平相差很遠(yuǎn)

enrichment.jpg

除此之外逆屡，作者發(fā)現(xiàn)一種中間狀態(tài)的甲基化可能是一種特殊的染色質(zhì)狀態(tài)：

Intermediate methylation signatures were equally strong within tissue samples, peripheral blood and purified cell types, suggesting that intermediate methylation is not simply reflecting differential methylation between cell types, but probably reflects a stable state of cell-to-cell variability within a population of cells of the same type.

Epigenomic differences during lineage specification

接下來(lái)作者探討DNA甲基化在不同cell lineage中的動(dòng)態(tài)變化

We next studied the relationship between DNA methylation dynamics and histone modifications across 95 epigenomes with methylation data, extendingprevious studies that focused on individual lineages

distribution_me.jpg

We also studied DNA methylation changes in three different systems.

First, we studied DNA methylation changes during embryonic stem(ES) cell differentiation . We identified regions that lost methylation (differentially methylated regions (DMRs)) upon differentiation of ES cells (E003) to mesodermal (E013), endodermal (E011)and ectodermal(E012) lineages (Fig. 4h). Each lineage showed a largely distinct set of 2,200–4,400 DMRs that are enriched for distinct transcription factor binding events (Fig. 4h, right column) ,consistent with their distinct developmental regulation. Upon further differentiation, ectodermal DMRs remained hypomethylated in three neural progenitor populations, despite the usage of distinct human ES cell
in DNA methylation during early differentiation .
(hESC) lines, and mesodermal and endodermal DMRs remained highly methylated (Fig. 4h), highlighting the lineage-specific nature of changes

DMRs.jpg

h圖中顯示了特定轉(zhuǎn)錄因子在特定DMR區(qū)和特定發(fā)育時(shí)期的富集

Second, we studied DNA methylation changes associated with breast epithelia differentiation

we found differences in nearest-gene enrichments圾旨，and differences in motif density (luminal DMRs show greater motif density for 51 transcription factors and lower density for 0 transcription factors).

在探討了DMR的動(dòng)態(tài)性后，作者進(jìn)一步探討造成動(dòng)態(tài)性魏蔗、差異甲基化的原因：是組織環(huán)境因素還是發(fā)育起源因素

Third, we asked whether tissue environment or developmental origin is the primary driving factor in DNA methylation differences observed in more differentiated cell types using epigenomes from skin cell types (keratinocytes E057/058, melanocytes E059/E061and fibroblasts E055/056) that share a common tissue environment but possess distinct embryonic origins (surface ectoderm, neural crest andmesoderm, respectively)選取具有相同組織環(huán)境而起源各不相同的皮膚細(xì)胞類(lèi)型

作者發(fā)現(xiàn)這些相同組織環(huán)境的細(xì)胞在甲基化譜和組蛋白修飾譜上overlap很少砍的，相反他們和各自的相同來(lái)源的細(xì)胞卻更相似；舉例來(lái)說(shuō)莺治，同樣來(lái)源于表皮外胚層的角質(zhì)細(xì)胞和乳腺細(xì)胞的shared DMR預(yù)示著一個(gè)common調(diào)控網(wǎng)絡(luò)廓鞠，和共同的信號(hào)通路以及結(jié)構(gòu)組分

keratinocytes shared 1,392 (18%) of DMRs with surface ectoderm derived breast cell types (hypergeometric P value ,1026), and 97% of these were hypomethylated. These shared DMRs were enriched for regulatory elements and cell-type-relevant genes, suggesting a common gene-regulatory network and shared signalling pathways and structural
components. These results suggest that common developmental origin can be a primary determinant ofglobalDNAmethylation patterns, and sometimes supersedes the immediate tissue environment in which they are found.

Most variable states and distinct chromosomal domains

作者接下來(lái)探討每個(gè)chromatin state在不同細(xì)胞和組織中的variability

We next sought to characterize the overall variability of each chromatin state across the full range ofcell and tissue types

coverage.jpg

可以看出，Quies最為constitutive谣旁，EnhG/TxFlnk等相對(duì)比較tissue specific

states之間的轉(zhuǎn)換頻率frequency矩陣

We next studied the relative frequency with which different chromatin states switch to other states across different tissues and cell types

relative_frequency.jpg

This revealed a relative switching enrichment between active states and repressed states, consistent with activation and repression of regulatory regions. The only exception was significant switching between transcribed states and active promoter and enhancer states, possibly due to alternative usage of promoters and enhancers embedded within transcribed elements.

We found that enhancers and promoters maintained their identity, except for a small subset of regions switching between enhancer signatures and promoter signatures
regions indeed possess both enhancer and promoter activity
. Luciferase assays showed that these , consistent with their epigenomic marks.

作者發(fā)現(xiàn)活性調(diào)控區(qū)和抑制區(qū)的轉(zhuǎn)換呈現(xiàn)明顯富集趨勢(shì)床佳，不過(guò)也有轉(zhuǎn)錄區(qū)向活性啟動(dòng)子、增強(qiáng)子states的轉(zhuǎn)換榄审，這可能是某些啟動(dòng)子砌们、增強(qiáng)子嵌合在轉(zhuǎn)錄區(qū)的結(jié)果

具體可以參考這篇文章：Conserved role of intragenic DNA methylation in regulating alternative promoters: https://doi.org/10.1038/nature09165 Nature文章，值得一看

而且有的區(qū)域在啟動(dòng)子活性和增強(qiáng)子活性間轉(zhuǎn)換

具體可以參考：Integrative analysis of haplotype-resolved epigenomes across
human tissues.（已讀搁进，筆記后續(xù)整理）

https://www.nature.com/articles/nature14217#article-info

這篇文章亮點(diǎn)是allelic biased enhancer-gene pairs

整合增強(qiáng)子和mRNA表達(dá)數(shù)據(jù)浪感，通過(guò)共表達(dá)分析可獲得增強(qiáng)子的候選靶基因。對(duì)于共表達(dá)的特定增強(qiáng)子-基因組合饼问，至少存在3種可能的關(guān)系模型：（1）因果關(guān)系影兽，增強(qiáng)子表達(dá)的變化引起基因的差異表達(dá)；（2）reactive關(guān)系莱革，基因位于增強(qiáng)子的上游峻堰；（3）共響應(yīng)關(guān)系，增強(qiáng)子和基因都響應(yīng)其它分子變化盅视。本文中以第一種關(guān)系進(jìn)行探討捐名，引入eQTL進(jìn)行分析∽蠖基本原理如下：影響增強(qiáng)子活性的單核苷酸多態(tài)性（SNP）會(huì)影響增強(qiáng)子下游靶基因的表達(dá)桐筏，由此使得SNP（或鄰近連鎖遺傳的SNP）成為目標(biāo)基因的eQTL位點(diǎn)；對(duì)于這樣的共表達(dá)增強(qiáng)子-基因?qū)δ磁椋褂肏i-C數(shù)據(jù)來(lái)評(píng)估該因果關(guān)系是否為直接調(diào)控梅忌。

來(lái)自：http://www.360doc.com/content/19/0821/17/65172408_856276298.shtml

關(guān)于enhancer及其臨床價(jià)值，參考：http://www.360doc.com/content/18/0413/16/45954995_745357589.shtml

相關(guān)數(shù)據(jù)挖掘文章：https://www.sohu.com/a/230491180_177233

motif clustering：同樣在這篇文章的method中提及除破，類(lèi)似這個(gè)問(wèn)題：https://www.biostars.org/p/140532/作者希望先cluster sequence然后找characteristic motif

HaploSeq能夠使得臨床醫(yī)生確定兩個(gè)突變是存在于相同的染色體上或是在不同的染色體上牧氮，從而有助于風(fēng)險(xiǎn)評(píng)估；快速確定哪些遺傳變異共同發(fā)生在同一染色體片段上瑰枫，因此來(lái)自于同一親緣

參考文：Hi-C分型絕招之HapCUT：http://wap.sciencenet.cn/blog-2970729-1175790.html?mobile=1認(rèn)識(shí)到傳統(tǒng)分型方法僅能分型出部分雜合變異踱葛，無(wú)法構(gòu)建基因組水平的單體型塊http://www.360doc.com/content/19/0423/15/52645714_830825887.shtml

參考：Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing：https://www.nature.com/articles/nbt.2728#article-info

Compartment clusters

作者以2Mb為一個(gè)bin區(qū)間丹莲，考察在這種resolution下各states的分布、overlap情況尸诽，各個(gè)cluster的情況如上圖d

While chromatin states were defined at nucleosome resolution (200 bp), we also studied the overall co-occurrence of chromatin states across tissues at a larger resolution (2 Mb) to recognize higher-order properties

可以看到active enhancer region部分（c1-c6)與剩下的cluster明顯分開(kāi)甥材，consistent with the identification of two large chromatin conformation compartments與先前鑒定出兩個(gè)大的染色質(zhì)構(gòu)象組分相一致，且每個(gè)compartment內(nèi)部又可以按states分成若干subdivisions

These subdivisions were based on average state density across a large diversity of cell types and showed strong differences in gene density,
CpG island occupancy, lamina association and cytogenetic bands (Fig. 5d), suggesting that they represent stable chromosomal features.

圖中熱圖的計(jì)算：按照在所有樣本中平均的states分?jǐn)?shù)（每一列是bin性含，每一行是states)

Relationships between marks and lineages

接下來(lái)作者對(duì)不同的組織和細(xì)胞類(lèi)型基于histone marks進(jìn)行層次聚類(lèi)：一個(gè)有趣的現(xiàn)象是：ES來(lái)源的細(xì)胞基本上還是和ES洲赵、ips聚在一起，而不是和他們將要分化成的組織聚在一起商蕴，這說(shuō)明相對(duì)somatic tissue而言它們還是更接近于pluripotent status

除了用樹(shù)來(lái)衡量細(xì)胞叠萍、組織的相似性，作者還考慮了其他方式绪商，比如相似性矩陣苛谷、MDS-plot（和PCA相似的降維方法，不過(guò)MDS是基于距離的格郁，PCA是基于相關(guān)性的腹殿；這里用歐式距離衡量相似性恰好合適）。并比較了用不同的marks signal計(jì)算的效果：

similarity_matrix.jpg

MDS_similarity.jpg

為了減少占屏例书，這里只截了部分圖片

對(duì)上述方法赫蛇，不同的marks在捕獲similarities上有區(qū)別：比如immune cell similarities、pluripotent cell similarities是分別用不同的marks分析捕獲到的

Imputation and completion of epigenomic data sets

imputation and completion：不是所有的epigenome數(shù)據(jù)集每個(gè)marks的信息都有雾叭，作者這里應(yīng)該是基于每個(gè)細(xì)胞系里不同marks的相關(guān)性、不同細(xì)胞系里相同marks的分布規(guī)律落蝙，對(duì)缺失marks信息進(jìn)行預(yù)測(cè)织狐，從而補(bǔ)全signal tracks

當(dāng)然，對(duì)于imputed data和observed data之間annotation筏勒、captured cell type relationships也做了比較移迫，相關(guān)性較好，說(shuō)明imputation和completion是可靠的

說(shuō)到chromatin-states管行，如果做25-state模型厨埋，可以對(duì)enhancer的狀態(tài)作進(jìn)一步細(xì)分，從而reveal更多的關(guān)于基因表達(dá)調(diào)控和人類(lèi)疾病相關(guān)的信息

Enhancer modules and their putative regulators

We clustered enhancer-only elements(Enh,EnhBiv,EnhG) into 226 enhancer modules of coordinated activity , promoter-only elements into 82 promoter modules and promoter/enhancer ‘dyadic’ elements into 129 modules , enabling us to distinguish ubiquitously active, lineage-restricted and tissue specific modules for each group.關(guān)于調(diào)控元件的module分析在生信中也很常見(jiàn)捐顷，就好比如果是做癌癥荡陷，經(jīng)常會(huì)涉及signatures，signature可以是基因集也可以是突變集迅涮，參考生物學(xué)背景废赞。同一module和signatures的individuals往往代表著他們參與的生物學(xué)功能的一致性。這里作者嘗試將enhancer叮姑、promoter elements基于在cell line唉地、tissue中的分布和active情況聚成module。

這一步是基于上一步的complement，更完整的數(shù)據(jù)可能對(duì)GO term的統(tǒng)計(jì)檢驗(yàn)功效更好

regulatory_modules.jpg

上圖分別展示了: Proximal gene enrichments for each module using gene ontology (GO) biological process (b) and human phenotypes(c)耘沼，對(duì)module近鄰的基因功能進(jìn)行GO分析

The genome sequence of enhancers in the same module showed substantial enrichment for sequence motifs associated with diverse transcription factors對(duì)于每個(gè)module內(nèi)的enhancer的motif進(jìn)行分析极颓，存在大量TF motif的富集，意味著他們是co-regulated sets群嗤，或許基于此還可以尋找到upstream regulators

進(jìn)一步地菠隆，就是探究這些motif，哪些motif對(duì)應(yīng)active TF骚烧，哪些對(duì)應(yīng)repressive TF浸赫，要做好這一步，就是結(jié)合gene expression數(shù)據(jù)赃绊，找出enhancer-gene pairs pattern既峡；對(duì)于每個(gè)module，他們的regulator如果剛好就是tissue-restrictive碧查，那么就可以用這些regulator來(lái)定義每個(gè)module

Linking-regulators-tissue.jpg

Impact of DNA sequence and genetic variation

接近尾聲运敢，上升到更精細(xì)的序列層次，哪些variation(snp)忠售、allele是與疾病相關(guān)的

用序列中的motif可以實(shí)現(xiàn)對(duì)marks的預(yù)測(cè)分析：

Using the area under the receiver operating curve (AUROC), we found between 71% predictive power for H3K4me1peaks and 98% for H3K4me3 peaks (average of 85% across six marks and methylation-depleted regions)用ROC曲線(xiàn)传惠、AUC衡量預(yù)測(cè)效果

As an example of a boundary enrichment, H3K4me3 peaks were flanked by motifs consisting
of a continuous stretch of A and T followed by a G and C, which may have a role in nucleosome positioning or recruiting promoter-associated transcription factors, such as nuclear receptors. Enhancer and promoter predictive motifs were enriched in high-resolution DNase hypersensitive sites. 舉例描述了H3K4me3 peaks的邊界motif特征

Second, we studied how sequence variants between the two alleles of the sameindividual can lead to allelic biases in histone modifications, DNAmethylation and transcript levels. 關(guān)于allelie biase，可以參考相對(duì)應(yīng)的paperhttps://www.nature.com/articles/nature14217#article-info稻扬，這個(gè)部分的methods我做了記錄卦方，比較詳細(xì)的haplotype方法學(xué)文章在文末也有ref

對(duì)于那些allele-biased gene，他們對(duì)應(yīng)的有： allelic epigenomic modifications in promoters (71%) and Hi-C-linked enhancers (69%)

Trait-associated variants enrich in tissue-specific marks

用到典型的GWAS分析泰佳，據(jù)以前的研究盼砍，很多疾病關(guān)聯(lián)snp就是落在regulatory elements內(nèi)的

舉例來(lái)說(shuō)，代謝疾病相關(guān)變異在肝臟enhancer marks中富集

trait.jpg

上圖每行代表一種疾病和其PubMedID逝她，每一列是一個(gè)cell line浇坐，顏色分?jǐn)?shù)應(yīng)該是相關(guān)variants的富集程度

附錄：增強(qiáng)子、啟動(dòng)子數(shù)據(jù)庫(kù)

因?yàn)樵诳磪⒖嘉恼聲r(shí)出現(xiàn)了很多相關(guān)數(shù)據(jù)庫(kù)黔宛，暫時(shí)整理在下面

FANTOM：https://fantom.gsc.riken.jp/

全稱(chēng)為Function Annotation Of The Mammalian Genome近刘，是一項(xiàng)國(guó)際性的研究項(xiàng)目，創(chuàng)建于2000年臀晃，最初的目的是對(duì)小鼠全長(zhǎng)cDNA序列進(jìn)行功能注釋觉渴。隨著不斷發(fā)展，研究的內(nèi)容在也在轉(zhuǎn)錄組學(xué)層面不斷拓展积仗。該項(xiàng)目中所用到的主要技術(shù)為RIKEN所發(fā)展出的Cap Analysis of Gene Expression(CAGE)技術(shù)疆拘，該技術(shù)的優(yōu)勢(shì)在于對(duì)于基因表達(dá)水平的測(cè)定具有更高的敏感性。

Question: FANTOM5 Promoter Atlashttps://www.biostars.org/p/101956/

FANTOM5技術(shù)之定位增強(qiáng)子:http://www.360doc.com/content/19/0821/17/65172408_856276298.shtml

CAGE-TSSchip: promoter-based expression profiling using the 5'-leading label of capped transcripts：https://genomebiology.biomedcentral.com/articles/10.1186/gb-2007-8-3-r42

有時(shí)具有明顯enhancer mark的element也會(huì)出現(xiàn)CAGE信號(hào)的富集寂曹，表示他們還有潛在的promoter活性哎迄，比如cis-regulatory
elements withdynamic signatures (cREDS)憔涉，詳見(jiàn)Integrative analysis of haplotype-resolved epigenomes across human tissues這篇文章

絕大多數(shù)基因有兩個(gè)甚至兩個(gè)以上的轉(zhuǎn)錄起始位點(diǎn)孵滞，不同的轉(zhuǎn)錄起始位點(diǎn)會(huì)導(dǎo)致基因受到不同的上游非翻譯區(qū)的調(diào)控作用（5'UTR）阻桅。

不同的5'UTR序列中可能包含截然不同的作用元件弦牡，不同的起始位點(diǎn)導(dǎo)致了基因的表達(dá)所響應(yīng)的信號(hào)也完全不同。同一個(gè)基因有可能受不同的啟動(dòng)子調(diào)控而導(dǎo)致表達(dá)的差異旨涝，可能會(huì)導(dǎo)致某些疾病的發(fā)生蹬屹。

CAGE-seq (Cap Analysis of Gene Expression AND deep Sequencing) 可以對(duì)mRNA中所有的TSS進(jìn)行鑒定，這是通過(guò)加帽位點(diǎn)鑒定實(shí)現(xiàn)的

VISTA enhancer browser:https://enhancer.lbl.gov/z增強(qiáng)子的體內(nèi)活性驗(yàn)證數(shù)據(jù)集

參考：https://www.cnblogs.com/yahengwang/p/11228108.html

多組學(xué)聯(lián)合分析-Matrix eQTL:http://www.reibang.com/p/6e6d54d7483e可以探索一下這個(gè)R包

RepeatMasker：http://www.reibang.com/p/50ce4bcd1972

A promoter-level mammalian expression atlas:https://www.nature.com/articles/nature13182#article-info

A map of the cis-regulatory sequences in the mouse genome:https://www.nature.com/articles/nature11243#additional-information這篇文章會(huì)涉及Shannon--‐entropy--‐based
analysis白华，后期會(huì)check out

最后編輯于：2020.04.09 16:42:29

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末慨默，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子弧腥，更是在濱河造成了極大的恐慌厦取，老刑警劉巖，帶你破解...
沈念sama閱讀 222,464評(píng)論 6贊 517
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件管搪，死亡現(xiàn)場(chǎng)離奇詭異虾攻，居然都是意外死亡，警方通過(guò)查閱死者的電腦和手機(jī)更鲁，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 95,033評(píng)論 3贊 399
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門(mén)霎箍，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)，“玉大人澡为，你說(shuō)我怎么就攤上這事漂坏。” “怎么了媒至？”我有些...
開(kāi)封第一講書(shū)人閱讀 169,078評(píng)論 0贊 362
道士緝兇錄：失蹤的賣(mài)姜人
文/不壞的土叔我叫張陵樊拓，是天一觀(guān)的道長(zhǎng)。經(jīng)常有香客問(wèn)我塘慕，道長(zhǎng)，這世上最難降的妖魔是什么蒂胞？我笑而不...
開(kāi)封第一講書(shū)人閱讀 59,979評(píng)論 1贊 299
?港島之戀（遺憾婚禮）
正文為了忘掉前任图呢，我火速辦了婚禮，結(jié)果婚禮上骗随，老公的妹妹穿的比我還像新娘蛤织。我一直安慰自己，他們只是感情好鸿染，可當(dāng)我...
茶點(diǎn)故事閱讀 69,001評(píng)論 6贊 398
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布指蚜。她就那樣靜靜地躺著，像睡著了一般涨椒。火紅的嫁衣襯著肌膚如雪摊鸡。梳的紋絲不亂的頭發(fā)上绽媒，一...
開(kāi)封第一講書(shū)人閱讀 52,584評(píng)論 1贊 312
城市分裂傳說(shuō)
那天，我揣著相機(jī)與錄音免猾，去河邊找鬼是辕。笑死，一個(gè)胖子當(dāng)著我的面吹牛猎提，可吹牛的內(nèi)容都是我干的获三。我是一名探鬼主播，決...
沈念sama閱讀 41,085評(píng)論 3贊 422
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼锨苏，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼疙教！你這毒婦竟也來(lái)了？” 一聲冷哼從身側(cè)響起伞租，我...
開(kāi)封第一講書(shū)人閱讀 40,023評(píng)論 0贊 277
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤贞谓，失蹤者是張志新（化名）和其女友劉穎，沒(méi)想到半個(gè)月后肯夏，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體经宏，經(jīng)...
沈念sama閱讀 46,555評(píng)論 1贊 319
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 38,626評(píng)論 3贊 342
?白月光啟示錄
正文我和宋清朗相戀三年驯击，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了烁兰。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 40,769評(píng)論 1贊 353
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡徊都，死狀恐怖沪斟，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情暇矫，我是刑警寧澤主之，帶...
沈念sama閱讀 36,439評(píng)論 5贊 351
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站李根，受9級(jí)特大地震影響槽奕，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜房轿，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 42,115評(píng)論 3贊 335
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一粤攒、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧囱持，春花似錦夯接、人聲如沸。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 32,601評(píng)論 0贊 25
一樁弒父案盔几，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)。三九已至掩幢，卻和暖如春逊拍，著一層夾襖步出監(jiān)牢的瞬間上鞠，已是汗流浹背。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 33,702評(píng)論 1贊 274
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工顺献，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留旗国，地道東北人。一個(gè)月前我還...
沈念sama閱讀 49,191評(píng)論 3贊 378
代替公主和親
正文我出身青樓注整，卻偏偏與公主長(zhǎng)得像能曾，于是被迫代替她去往敵國(guó)和親。傳聞我的和親對(duì)象是個(gè)殘疾皇子肿轨，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,781評(píng)論 2贊 361