Integrative analysis of 111 reference human epigenomes
因?yàn)槲业拿枋龆际腔谖易约旱睦斫饷泄矗灰欢ㄕ_濒蒋,所以我的習(xí)慣是解析的時(shí)候?qū)⒃牡膬?nèi)容也拷下來(lái)蓝翰,雖然增加了篇幅,但有利于理解偶宫。頭一次在簡(jiǎn)書(shū)上寫(xiě)文章谐檀,若有錯(cuò)誤還請(qǐng)諒解。
Roadmap計(jì)劃重要的paper:https://www.nature.com/articles/nature14248#Sec41
Computational tools and methods:https://www.nature.com/articles/nature14316
Intro
We integrate information about histone marks, DNA methylation, DNA accessibility and RNA expression to infer high-resolution maps of regulatory elements annotated jointly across a total of 127 reference epigenomes spanning diverse cell and tissue types.
In addition,we study the role of regulatory regions in human disease by relating our epigenomic annotations to genetic variants associated with common traits and disorders.
Specific highlights of our findings are given below.
- Histone mark combinations show distinct levels ofDNA methylation and accessibility, and predict differences in RNA expression levels that are not reflected in either accessibility or methylation.
- Megabase-scale regions with distinct epigenomic signatures show strong differences in activity, gene density and nuclear lamina associations, suggesting distinct chromosomal domains.
- Approximately 5% ofeach reference epigenome shows enhancer and promoter signatures, which are twofold enriched for evolutionarily conserved non-exonic elements on average.
- Epigenomic data sets can be imputed at high resolution from existing data, completing missing marks in additional cell types, and providing a more robust signal even for observed data sets.
- Dynamics of epigenomic marks in their relevant chromatin states allow a data-driven approach to learn biologically meaningful relationships between cell types, tissues and lineages.
- Enhancers with coordinated activity patterns across tissues are enriched for common gene functions and human phenotypes, suggesting that they represent coordinately regulated modules.
- Regulatorymotifs are enriched in tissue-specific enhancers, enhancer modules and DNA accessibility footprints, providing an important resource for gene-regulatory studies.
- Genetic variants associated with diverse traits show epigenomic enrichments in trait-relevant tissues, providing an important resource for understanding the molecular basis of human disease.
Reference epigenome mapping across tissues and cell types
The REMCs generated a total of 2,805 genome-wide data sets, including 1,821 histone modification data sets, 360DNA accessibility data sets,
277 DNA methylation data sets, and 166 RNA-seq data sets, encompassing a total of150.21 billion mapped sequencing reads corresponding to 3,174-fold coverage of the human genome.
Here, we focus on a subset of 1,936 data sets comprising 111 reference epigenomes, which we define as having a core set of five histone modification marks
The five marks consist of: histone H3 lysine 4 trimethylation (H3K4me3), associated with promoter regions, H3K4me1 associated with enhancer regions ;H3 lysine36 trimethylation (H3K36me3), associated with transcribed regions; H3 lysine 27 trimethylation (H3K27me3), associated with Polycomb repression ; and H3 lysine 9 trimethylation(H3K9me3), associated with heterochromatin regions
Selected epigenomes also contain a subset of additional epigenomic marks, including: acetylation marks H3K27acand H3K9ac, associated with increasedactivationofenhancer andpromoter regions2
We computed several quality control measures介紹數(shù)據(jù)集的質(zhì)控指標(biāo):
- the number of distinct uniquely mapped reads; 唯一比對(duì)的reads總數(shù)
- the fraction of mapped reads overlapping areas of enrichment等脂;經(jīng)典的FRiP俏蛮,富集在peaks的reads比例
- genome-wide strand cross-correlation 交叉相關(guān)性質(zhì)量評(píng)估度量值,相關(guān)概念見(jiàn):https://www.baidu.com/link?url=xS6iz8-hJ0J6sNoxpxVgRsPeU52ZMXCGanmaWmyjM04Kw1xxiwqiAhpIwvNJS910&wd=&eqid=c0e6730200024bea000000065e7dd6f5NSC值越大表明富集效果越好慎菲,NSC值低于1.1
表明較弱的富集嫁蛇,小于1表示無(wú)富集。NSC值稍微低于1.05露该,有較低的信噪比或很少的峰,這肯能是生物學(xué)真實(shí)現(xiàn)象第煮,比如有的因子在特定組織類(lèi)型中只有很少的結(jié)合位點(diǎn)解幼;也可能確實(shí)是數(shù)據(jù)質(zhì)量差。 - inter-replicate correlation; 重復(fù)間相關(guān)性
- multidimensional scaling of data sets from different production centres 不同機(jī)構(gòu)產(chǎn)出數(shù)據(jù)的歸一化
- correlation across pairs of data sets 不同數(shù)據(jù)集的相關(guān)性
- consistency between assays carried out in multiple mapping centres
- read mapping quality for bisulfite-treated reads
- agreement with imputed data
Outlier data sets were flagged, removed or replaced, and lower-coverage data sets were combined where possible (see Methods).
Chromatin states,DNAmethylation and DNAaccessibility
15-state model
As a foundation for integrative analysis, we used a common set of com-
binatorial chromatin states across all 111 epigenomes, plus 16 additional epigenomes generated by the ENCODEproject (127 epigenomes in total), using the core set of five histone modification marks that were common to all.
We trained a 15-state model consisting of 8 active states and 7 repressed states that were recurrently recovered and showed distinct levels of DNA methylation包警、DNA accessibility 撵摆、regulator binding and evolutionary conservation
作者對(duì)127個(gè)epigenomes進(jìn)行chromStates建模,調(diào)參(比如shift-size害晦,具體見(jiàn)methods)特铝,用60個(gè)高質(zhì)量的epigenomes數(shù)據(jù)集作為訓(xùn)練集,構(gòu)建15-state model壹瘟,并應(yīng)用到剩下的數(shù)據(jù)集中(還有expanded 18-state model)
關(guān)于states的詳細(xì)描述:
增強(qiáng)子和啟動(dòng)子區(qū)在進(jìn)化保守非外顯子區(qū)呈現(xiàn)富集趨勢(shì)鲫剿,上圖的f
Enhancer and promoter states covered approximately 5% of each reference epigenome on average, and showed enrichment for evolutionarily conserved non-exonic regions
Evolutionary conservation analysis of chromatin states in each cell type for conserved elements (GERP), using all conserved elements (a,b), or only non-exonic conserved elements (c,d) for both the 15-state model (a,c) and the 18-state model (b,d)
關(guān)于15-state model的穩(wěn)健性:
之前的15-state模型是把111個(gè)參考表觀(guān)組聯(lián)合起來(lái)構(gòu)建的,為了評(píng)估這個(gè)模型的穩(wěn)健性稻轨,這里作者對(duì)111個(gè)參考轉(zhuǎn)錄組分別應(yīng)用ChromHMM構(gòu)建15-state灵莲,然后把得到的1,680-state
emission probability vectors(估計(jì)是111*15+15)進(jìn)行聚類(lèi),發(fā)現(xiàn)分別對(duì)數(shù)據(jù)集建模得到的聚類(lèi)結(jié)果非常好(仍然是主要的15個(gè)state)殴俱,且同一個(gè)state的數(shù)據(jù)集間有一定variation政冻。具體的可以參考method:
The trained model was then used to compute the posterior probability of each state for each genomic bin in each reference epigenome. The regions were labelled using the state with the maximum posterior probability.
(最大后驗(yàn)概率,意思是構(gòu)建一個(gè)似然函數(shù)线欲,參數(shù)是state的類(lèi)別(15個(gè))明场,選擇一個(gè)類(lèi)別使得在給定樣本(這里是基因組區(qū)間bin)的條件下似然函數(shù)值最大,這個(gè)state參數(shù)就是預(yù)測(cè)的bin的state
且有兩個(gè)新的clusters:
This analysis revealed two new clusters (red crosses) which are not represented in the 15 states of the jointly learned model: ‘HetWk’, a cluster showing weak enrichment for H3K9me3; and ‘Rpts’, a cluster showing H3K9me3 along with a diversity of other marks, and enriched in specific types of repetitive
Relationship between different modalities
We used chromatin states to study the relationship between histone modification patterns, RNA expression levels, DNA methylation and DNA accessibility.
we found low DNA methylation and high accessibility in promoter states, high DNAmethylation and low accessibility in transcribed states, and intermediate DNAmethylation and accessibility in enhancer states
可以看出李丰,對(duì)于高表達(dá)的基因苦锨,DNA甲基化的差異更顯著(c),且高表達(dá)基因更多地位于strong enhancers附近(H3K27ac+H3K4me1)
Chromatin states sometimes captured differences in RNA expression that are missed by DNA methylation or accessibility. For example, TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions of DNA accessibility but widely differing enrichments for expressed genes染色質(zhì)狀態(tài)有時(shí)可以捕獲更細(xì)致的那些會(huì)被DNA甲基化或可及性忽略的RNA差異表達(dá)信息;又或者兩個(gè)state可能甲基化水平相當(dāng)?shù)杉靶院筒町惐磉_(dá)水平相差很遠(yuǎn)
除此之外逆屡,作者發(fā)現(xiàn)一種中間狀態(tài)的甲基化可能是一種特殊的染色質(zhì)狀態(tài):
Intermediate methylation signatures were equally strong within tissue samples, peripheral blood and purified cell types, suggesting that intermediate methylation is not simply reflecting differential methylation between cell types, but probably reflects a stable state of cell-to-cell variability within a population of cells of the same type.
Epigenomic differences during lineage specification
接下來(lái)作者探討DNA甲基化在不同cell lineage中的動(dòng)態(tài)變化
We next studied the relationship between DNA methylation dynamics and histone modifications across 95 epigenomes with methylation data, extendingprevious studies that focused on individual lineages
We also studied DNA methylation changes in three different systems.
First, we studied DNA methylation changes during embryonic stem(ES) cell differentiation . We identified regions that lost methylation (differentially methylated regions (DMRs)) upon differentiation of ES cells (E003) to mesodermal (E013), endodermal (E011)and ectodermal(E012) lineages (Fig. 4h). Each lineage showed a largely distinct set of 2,200–4,400 DMRs that are enriched for distinct transcription factor binding events (Fig. 4h, right column) ,consistent with their distinct developmental regulation. Upon further differentiation, ectodermal DMRs remained hypomethylated in three neural progenitor populations, despite the usage of distinct human ES cell
in DNA methylation during early differentiation .
(hESC) lines, and mesodermal and endodermal DMRs remained highly methylated (Fig. 4h), highlighting the lineage-specific nature of changes
h圖中顯示了特定轉(zhuǎn)錄因子在特定DMR區(qū)和特定發(fā)育時(shí)期的富集
Second, we studied DNA methylation changes associated with breast epithelia differentiation
we found differences in nearest-gene enrichments圾旨,and differences in motif density (luminal DMRs show greater motif density for 51 transcription factors and lower density for 0 transcription factors).
在探討了DMR的動(dòng)態(tài)性后,作者進(jìn)一步探討造成動(dòng)態(tài)性魏蔗、差異甲基化的原因:是組織環(huán)境因素還是發(fā)育起源因素
Third, we asked whether tissue environment or developmental origin is the primary driving factor in DNA methylation differences observed in more differentiated cell types using epigenomes from skin cell types (keratinocytes E057/058, melanocytes E059/E061and fibroblasts E055/056) that share a common tissue environment but possess distinct embryonic origins (surface ectoderm, neural crest andmesoderm, respectively)選取具有相同組織環(huán)境而起源各不相同的皮膚細(xì)胞類(lèi)型
作者發(fā)現(xiàn)這些相同組織環(huán)境的細(xì)胞在甲基化譜和組蛋白修飾譜上overlap很少砍的,相反他們和各自的相同來(lái)源的細(xì)胞卻更相似;舉例來(lái)說(shuō)莺治,同樣來(lái)源于表皮外胚層的角質(zhì)細(xì)胞和乳腺細(xì)胞的shared DMR預(yù)示著一個(gè)common調(diào)控網(wǎng)絡(luò)廓鞠,和共同的信號(hào)通路以及結(jié)構(gòu)組分
keratinocytes shared 1,392 (18%) of DMRs with surface ectoderm derived breast cell types (hypergeometric P value ,1026), and 97% of these were hypomethylated. These shared DMRs were enriched for regulatory elements and cell-type-relevant genes, suggesting a common gene-regulatory network and shared signalling pathways and structural
components. These results suggest that common developmental origin can be a primary determinant ofglobalDNAmethylation patterns, and sometimes supersedes the immediate tissue environment in which they are found.
Most variable states and distinct chromosomal domains
作者接下來(lái)探討每個(gè)chromatin state在不同細(xì)胞和組織中的variability
We next sought to characterize the overall variability of each chromatin state across the full range ofcell and tissue types
可以看出,Quies最為constitutive谣旁,EnhG/TxFlnk等相對(duì)比較tissue specific
states之間的轉(zhuǎn)換頻率frequency矩陣
We next studied the relative frequency with which different chromatin states switch to other states across different tissues and cell types
This revealed a relative switching enrichment between active states and repressed states, consistent with activation and repression of regulatory regions. The only exception was significant switching between transcribed states and active promoter and enhancer states, possibly due to alternative usage of promoters and enhancers embedded within transcribed elements.
We found that enhancers and promoters maintained their identity, except for a small subset of regions switching between enhancer signatures and promoter signatures
regions indeed possess both enhancer and promoter activity
. Luciferase assays showed that these , consistent with their epigenomic marks.作者發(fā)現(xiàn)活性調(diào)控區(qū)和抑制區(qū)的轉(zhuǎn)換呈現(xiàn)明顯富集趨勢(shì)床佳,不過(guò)也有轉(zhuǎn)錄區(qū)向活性啟動(dòng)子、增強(qiáng)子states的轉(zhuǎn)換榄审,這可能是某些啟動(dòng)子砌们、增強(qiáng)子嵌合在轉(zhuǎn)錄區(qū)的結(jié)果
具體可以參考這篇文章:Conserved role of intragenic DNA methylation in regulating alternative promoters: https://doi.org/10.1038/nature09165 Nature文章,值得一看
而且有的區(qū)域在啟動(dòng)子活性和增強(qiáng)子活性間轉(zhuǎn)換
具體可以參考:Integrative analysis of haplotype-resolved epigenomes across
human tissues.(已讀搁进,筆記后續(xù)整理)https://www.nature.com/articles/nature14217#article-info
這篇文章亮點(diǎn)是allelic biased enhancer-gene pairs
整合增強(qiáng)子和mRNA表達(dá)數(shù)據(jù)浪感,通過(guò)共表達(dá)分析可獲得增強(qiáng)子的候選靶基因。對(duì)于共表達(dá)的特定增強(qiáng)子-基因組合饼问,至少存在3種可能的關(guān)系模型:(1)因果關(guān)系影兽,增強(qiáng)子表達(dá)的變化引起基因的差異表達(dá);(2)reactive關(guān)系莱革,基因位于增強(qiáng)子的上游峻堰;(3)共響應(yīng)關(guān)系,增強(qiáng)子和基因都響應(yīng)其它分子變化盅视。本文中以第一種關(guān)系進(jìn)行探討捐名,引入eQTL進(jìn)行分析∽蠖基本原理如下:影響增強(qiáng)子活性的單核苷酸多態(tài)性(SNP)會(huì)影響增強(qiáng)子下游靶基因的表達(dá)桐筏,由此使得SNP(或鄰近連鎖遺傳的SNP)成為目標(biāo)基因的eQTL位點(diǎn);對(duì)于這樣的共表達(dá)增強(qiáng)子-基因?qū)δ磁椋褂肏i-C數(shù)據(jù)來(lái)評(píng)估該因果關(guān)系是否為直接調(diào)控梅忌。
來(lái)自:http://www.360doc.com/content/19/0821/17/65172408_856276298.shtml
關(guān)于enhancer及其臨床價(jià)值,參考:http://www.360doc.com/content/18/0413/16/45954995_745357589.shtml
相關(guān)數(shù)據(jù)挖掘文章:https://www.sohu.com/a/230491180_177233
motif clustering:同樣在這篇文章的method中提及除破,類(lèi)似這個(gè)問(wèn)題:https://www.biostars.org/p/140532/作者希望先cluster sequence然后找characteristic motif
HaploSeq能夠使得臨床醫(yī)生確定兩個(gè)突變是存在于相同的染色體上或是在不同的染色體上牧氮,從而有助于風(fēng)險(xiǎn)評(píng)估;快速確定哪些遺傳變異共同發(fā)生在同一染色體片段上瑰枫,因此來(lái)自于同一親緣
參考文:Hi-C分型絕招之HapCUT:http://wap.sciencenet.cn/blog-2970729-1175790.html?mobile=1認(rèn)識(shí)到傳統(tǒng)分型方法僅能分型出部分雜合變異踱葛,無(wú)法構(gòu)建基因組水平的單體型塊http://www.360doc.com/content/19/0423/15/52645714_830825887.shtml
參考:Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing:https://www.nature.com/articles/nbt.2728#article-info
Compartment clusters
作者以2Mb為一個(gè)bin區(qū)間丹莲,考察在這種resolution下各states的分布、overlap情況尸诽,各個(gè)cluster的情況如上圖d
While chromatin states were defined at nucleosome resolution (200 bp), we also studied the overall co-occurrence of chromatin states across tissues at a larger resolution (2 Mb) to recognize higher-order properties
可以看到active enhancer region部分(c1-c6)與剩下的cluster明顯分開(kāi)甥材,consistent with the identification of two large chromatin conformation compartments與先前鑒定出兩個(gè)大的染色質(zhì)構(gòu)象組分相一致,且每個(gè)compartment內(nèi)部又可以按states分成若干subdivisions
These subdivisions were based on average state density across a large diversity of cell types and showed strong differences in gene density,
CpG island occupancy, lamina association and cytogenetic bands (Fig. 5d), suggesting that they represent stable chromosomal features.圖中熱圖的計(jì)算:按照在所有樣本中平均的states分?jǐn)?shù)(每一列是bin性含,每一行是states)
Relationships between marks and lineages
接下來(lái)作者對(duì)不同的組織和細(xì)胞類(lèi)型基于histone marks進(jìn)行層次聚類(lèi):一個(gè)有趣的現(xiàn)象是:ES來(lái)源的細(xì)胞基本上還是和ES洲赵、ips聚在一起,而不是和他們將要分化成的組織聚在一起商蕴,這說(shuō)明相對(duì)somatic tissue而言它們還是更接近于pluripotent status
除了用樹(shù)來(lái)衡量細(xì)胞叠萍、組織的相似性,作者還考慮了其他方式绪商,比如相似性矩陣苛谷、MDS-plot(和PCA相似的降維方法,不過(guò)MDS是基于距離的格郁,PCA是基于相關(guān)性的腹殿;這里用歐式距離衡量相似性恰好合適)。并比較了用不同的marks signal計(jì)算的效果:
為了減少占屏例书,這里只截了部分圖片
對(duì)上述方法赫蛇,不同的marks在捕獲similarities上有區(qū)別:比如immune cell similarities、pluripotent cell similarities是分別用不同的marks分析捕獲到的
Imputation and completion of epigenomic data sets
imputation and completion:不是所有的epigenome數(shù)據(jù)集每個(gè)marks的信息都有雾叭,作者這里應(yīng)該是基于每個(gè)細(xì)胞系里不同marks的相關(guān)性、不同細(xì)胞系里相同marks的分布規(guī)律落蝙,對(duì)缺失marks信息進(jìn)行預(yù)測(cè)织狐,從而補(bǔ)全signal tracks
當(dāng)然,對(duì)于imputed data和observed data之間annotation筏勒、captured cell type relationships也做了比較移迫,相關(guān)性較好,說(shuō)明imputation和completion是可靠的
說(shuō)到chromatin-states管行,如果做25-state模型厨埋,可以對(duì)enhancer的狀態(tài)作進(jìn)一步細(xì)分,從而reveal更多的關(guān)于基因表達(dá)調(diào)控和人類(lèi)疾病相關(guān)的信息
Enhancer modules and their putative regulators
We clustered enhancer-only elements(Enh,EnhBiv,EnhG) into 226 enhancer modules of coordinated activity , promoter-only elements into 82 promoter modules and promoter/enhancer ‘dyadic’ elements into 129 modules , enabling us to distinguish ubiquitously active, lineage-restricted and tissue specific modules for each group.關(guān)于調(diào)控元件的module分析在生信中也很常見(jiàn)捐顷,就好比如果是做癌癥荡陷,經(jīng)常會(huì)涉及signatures,signature可以是基因集也可以是突變集迅涮,參考生物學(xué)背景废赞。同一module和signatures的individuals往往代表著他們參與的生物學(xué)功能的一致性。這里作者嘗試將enhancer叮姑、promoter elements基于在cell line唉地、tissue中的分布和active情況聚成module。
這一步是基于上一步的complement,更完整的數(shù)據(jù)可能對(duì)GO term的統(tǒng)計(jì)檢驗(yàn)功效更好
上圖分別展示了: Proximal gene enrichments for each module using gene ontology (GO) biological process (b) and human phenotypes(c)耘沼,對(duì)module近鄰的基因功能進(jìn)行GO分析
The genome sequence of enhancers in the same module showed substantial enrichment for sequence motifs associated with diverse transcription factors對(duì)于每個(gè)module內(nèi)的enhancer的motif進(jìn)行分析极颓,存在大量TF motif的富集,意味著他們是co-regulated sets群嗤,或許基于此還可以尋找到upstream regulators
進(jìn)一步地菠隆,就是探究這些motif,哪些motif對(duì)應(yīng)active TF骚烧,哪些對(duì)應(yīng)repressive TF浸赫,要做好這一步,就是結(jié)合gene expression數(shù)據(jù)赃绊,找出enhancer-gene pairs pattern既峡;對(duì)于每個(gè)module,他們的regulator如果剛好就是tissue-restrictive碧查,那么就可以用這些regulator來(lái)定義每個(gè)module
Impact of DNA sequence and genetic variation
接近尾聲运敢,上升到更精細(xì)的序列層次,哪些variation(snp)忠售、allele是與疾病相關(guān)的
用序列中的motif可以實(shí)現(xiàn)對(duì)marks的預(yù)測(cè)分析:
Using the area under the receiver operating curve (AUROC), we found between 71% predictive power for H3K4me1peaks and 98% for H3K4me3 peaks (average of 85% across six marks and methylation-depleted regions)用ROC曲線(xiàn)传惠、AUC衡量預(yù)測(cè)效果
As an example of a boundary enrichment, H3K4me3 peaks were flanked by motifs consisting
of a continuous stretch of A and T followed by a G and C, which may have a role in nucleosome positioning or recruiting promoter-associated transcription factors, such as nuclear receptors. Enhancer and promoter predictive motifs were enriched in high-resolution DNase hypersensitive sites. 舉例描述了H3K4me3 peaks的邊界motif特征
Second, we studied how sequence variants between the two alleles of the sameindividual can lead to allelic biases in histone modifications, DNAmethylation and transcript levels. 關(guān)于allelie biase,可以參考相對(duì)應(yīng)的paperhttps://www.nature.com/articles/nature14217#article-info稻扬,這個(gè)部分的methods我做了記錄卦方,比較詳細(xì)的haplotype方法學(xué)文章在文末也有ref
對(duì)于那些allele-biased gene,他們對(duì)應(yīng)的有: allelic epigenomic modifications in promoters (71%) and Hi-C-linked enhancers (69%)
Trait-associated variants enrich in tissue-specific marks
用到典型的GWAS分析泰佳,據(jù)以前的研究盼砍,很多疾病關(guān)聯(lián)snp就是落在regulatory elements內(nèi)的
舉例來(lái)說(shuō),代謝疾病相關(guān)變異在肝臟enhancer marks中富集
上圖每行代表一種疾病和其PubMedID逝她,每一列是一個(gè)cell line浇坐,顏色分?jǐn)?shù)應(yīng)該是相關(guān)variants的富集程度
附錄:增強(qiáng)子、啟動(dòng)子數(shù)據(jù)庫(kù)
因?yàn)樵诳磪⒖嘉恼聲r(shí)出現(xiàn)了很多相關(guān)數(shù)據(jù)庫(kù)黔宛,暫時(shí)整理在下面
FANTOM:https://fantom.gsc.riken.jp/
全稱(chēng)為Function Annotation Of The Mammalian Genome近刘,是一項(xiàng)國(guó)際性的研究項(xiàng)目,創(chuàng)建于2000年臀晃,最初的目的是對(duì)小鼠全長(zhǎng)cDNA序列進(jìn)行功能注釋觉渴。隨著不斷發(fā)展,研究的內(nèi)容在也在轉(zhuǎn)錄組學(xué)層面不斷拓展积仗。該項(xiàng)目中所用到的主要技術(shù)為RIKEN所發(fā)展出的Cap Analysis of Gene Expression(CAGE)技術(shù)疆拘,該技術(shù)的優(yōu)勢(shì)在于對(duì)于基因表達(dá)水平的測(cè)定具有更高的敏感性。
Question: FANTOM5 Promoter Atlashttps://www.biostars.org/p/101956/
FANTOM5技術(shù)之定位增強(qiáng)子:http://www.360doc.com/content/19/0821/17/65172408_856276298.shtml
CAGE-TSSchip: promoter-based expression profiling using the 5'-leading label of capped transcripts:https://genomebiology.biomedcentral.com/articles/10.1186/gb-2007-8-3-r42
有時(shí)具有明顯enhancer mark的element也會(huì)出現(xiàn)CAGE信號(hào)的富集寂曹,表示他們還有潛在的promoter活性哎迄,比如cis-regulatory
elements withdynamic signatures (cREDS)憔涉,詳見(jiàn)Integrative analysis of haplotype-resolved epigenomes across human tissues這篇文章絕大多數(shù)基因有兩個(gè)甚至兩個(gè)以上的轉(zhuǎn)錄起始位點(diǎn)孵滞,不同的轉(zhuǎn)錄起始位點(diǎn)會(huì)導(dǎo)致基因受到不同的上游非翻譯區(qū)的調(diào)控作用(5'UTR)阻桅。
不同的5'UTR序列中可能包含截然不同的作用元件弦牡,不同的起始位點(diǎn)導(dǎo)致了基因的表達(dá)所響應(yīng)的信號(hào)也完全不同。同一個(gè)基因有可能受不同的啟動(dòng)子調(diào)控而導(dǎo)致表達(dá)的差異旨涝,可能會(huì)導(dǎo)致某些疾病的發(fā)生蹬屹。
CAGE-seq (Cap Analysis of Gene Expression AND deep Sequencing) 可以對(duì)mRNA中所有的TSS進(jìn)行鑒定,這是通過(guò)加帽位點(diǎn)鑒定實(shí)現(xiàn)的
VISTA enhancer browser:https://enhancer.lbl.gov/z增強(qiáng)子的體內(nèi)活性驗(yàn)證數(shù)據(jù)集
參考:https://www.cnblogs.com/yahengwang/p/11228108.html
多組學(xué)聯(lián)合分析-Matrix eQTL:http://www.reibang.com/p/6e6d54d7483e可以探索一下這個(gè)R包
RepeatMasker:http://www.reibang.com/p/50ce4bcd1972
A promoter-level mammalian expression atlas:https://www.nature.com/articles/nature13182#article-info
A map of the cis-regulatory sequences in the mouse genome:https://www.nature.com/articles/nature11243#additional-information這篇文章會(huì)涉及Shannon--‐entropy--‐based
analysis白华,后期會(huì)check out