重要的Hi-C相關(guān)文獻(xiàn)
第一篇Hi-C文章: Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome; DOI: 10.1126/science.1181369
TAD提出: Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions 跛蛋;doi: 10.1038/nature11082
高分辨率Hi-C: A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping https://doi.org/10.1016/j.cell.2014.11.021
單細(xì)胞 Single cell: Hi-C reveals cell-to-cell variability in chromosome structure doi: 10.1038/nature12593髓抑;3D structures of individual mammalian genomes studied by single-cell Hi-C doi:10.1038/nature21429
綜述:老師所講Hi-C相關(guān)基礎(chǔ)知識(shí)主要來(lái)自于綜述Organization and function of the 3D genome啼肩,doi:10.1038/nrg.2016.112
Chromatin interaction in different resolutions
不同分辨率Hi-C可以看到的內(nèi)容不同
5KB可以看到各種loop
10KB可以看到TAD
50kb可以看到TAD之間的關(guān)聯(lián)
在整個(gè)染色體的水平可以看到染色質(zhì)的位置分布
什么造成了所謂的TAD
cohesin complex
Cohesin is a protein complex that regulates the separation of sister chromatids during cell division, either mitosis or meiosis.
Cohesins hold sister chromatids together after DNA replication until anaphase when removal of cohesin leads to separation of sister chromatids.
CTCF proteins
轉(zhuǎn)錄阻抑物CTCF
CTCF與靶順序因子的結(jié)合可阻斷增強(qiáng)子和啟動(dòng)子的相互作用,從而將增強(qiáng)子的活性限制在一定的功能區(qū)域
除了阻斷增強(qiáng)子外蕴茴,CTCF還可作為染色質(zhì)屏障阻止異染色質(zhì)的傳播
Predicting enhancer-promoter loops 如何預(yù)測(cè)EPL
兩種類似的算法
TargetFinder(Whalen et al. Nat Gen 2016)— an algorithm that uses many functional genomic datasets, including DNase-seq, histone marks, transcription factor (TF) ChIP-seq, gene expression, and DNA methylation data etc.
Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin ,doi:10.1038/ng.3539
pipeline
RIPPLE (Roy et al. NAR 2016) — Also uses functional genomic datasets for feature extraction.
- A predictive modeling approach for cell line-specific long-range regulatory interactions , https://doi.org/10.1093/nar/gkv865
二者共同的發(fā)現(xiàn)
- signals from these functional genomic data are informative to computationally distinguish enhancer-promoter interactions from noninteracting enhancer-promoter pairs.
PEP 只用序列信息來(lái)進(jìn)行分析(馬堅(jiān)實(shí)驗(yàn)室)
Hi-C分析流程
Analysis methods for studying the 3D architecture of the genome 锹漱,https://doi.org/10.1186/s13059-015-0745-7
流程
contact map
定義:A contact map is a matrix with rows and columns representing non-overlapping ‘bins’ across the genome.
Each entry in the matrix contains a count of read pairs that connect the corresponding bin pair in a Hi-C experiment.
How to determine bin size
- No standard rule. Rao et al. 2014 suggests using a bin size that results in at least 80% of all possible bins with >1000 contacts.
Two types of approaches to correct bias in the contact map
-
Explicit approach — assuming some known bias
- Restriction enzyme fragment lengths, GC content, and sequence mappability are three major sources of biases in Hi-C data (Yaffe and Tanay, Nat Genet 2011) - HiCNorm — simpler and faster (Hu et al. Bioinformatics 2012)
-
Implicit approach — assume no known source of bias and that each locus receives equal sequence coverage after biases are removed
- In other words, if there is no bias, the total genome-wide contact summation for each locus will be a constant, i.e., each locus has 'equal visibility'
Contact matrix normalization
如何進(jìn)行標(biāo)準(zhǔn)化
鑒別TAD的算法
HMM(任兵)
Arrowhead