PRC復合物抑制或者激活的那些基因干擾了單細胞的基因表達
這應該是關于PRC復合物的第一篇單細胞轉(zhuǎn)錄組數(shù)據(jù)吧酷麦,文章是2016年10月接收的或悲,Flipping between Polycomb repressed and active transcriptional states introduces noise in gene expression
背景知識
Polycomb repressive complexes (PRCs) 是非常著名的組蛋白修飾復合物测僵,一般來說是抑制基因表達的师溅。但是如果某基因同時被PRC復合物和RNA polymerase II (RNAPII)結(jié)合逊躁,也可以被激活轉(zhuǎn)錄似踱。但其中機理不明。
- PRC1, which monoubiquitinylates histone 2?A lysine 119 (H2Aub1) via the ubiquitin ligase RING1A/B;
- PRC2, which catalyzes dimethylation and trimethylation of H3K27 (H3K27me2/3) via the histone methyltransferase (HMT) EZH1/2.
Embryonic stem cells (ESCs) 能自我更新并且具有分化成其它細胞類型的潛力稽煤,并且認為其干細胞特性由表觀調(diào)控保持核芽。一般經(jīng)由干細胞marker挑選,比如Oct4酵熙,需要很明確的證明其不表達分化marker轧简,比如Gata4 and Gata6∝叶可以對這兩個marker基因集合做熱圖哮独,如下:
RNAPII的修飾決定著轉(zhuǎn)錄過程,主要是其碳末端的磷酸化修飾察藐。具體已知結(jié)論如下:
- Phosphorylation of S5 residues (S5p) correlates with initiation, capping, and H3K4 HMT recruitment.
- S2 phosphorylation (S2p) correlates with elongation, splicing, polyadenylation, and H3K36 HMT recruitment.
- Phosphorylation of RNAPII on S5, but not on S2, is associated with Polycomb repression and poised transcription factories, while active factories are associated with phosphorylation on both residues.
- S7 phosphorylation (S7p) marks the transition between S5p and S2p, but its mechanistic role is unclear presently.
如果把PRC復合物和RNAPII的修飾結(jié)合起來皮璧,可以把基因分成兩類:
- (1) repressed genes associated with PRCs and unproductive RNAPII (phosphorylated at S5 but lacking S2p; PRC-repressed)
- (2) expressed genes bound by PRCs and active RNAPII (both S5p and S2p; PRC-active)
當然,這些基因都被H3K4me3 and H3K27me3共同結(jié)合分飞,被稱作二價狀態(tài)悴务。
單細胞轉(zhuǎn)錄組測序
測的是血清+ leukemia-inhibitory factor (LIF)培養(yǎng)的小鼠OS25 ESCs 細胞,用的是 Fluidigm C1進行單細胞獲取譬猫,建庫用的是SMARTer試劑盒讯檐。
單細胞過濾:
- (1) the total number of reads mapping to exons for the cell was lower than half a million
- (2) the percentage of reads mapping to mitochondrial-encoded RNAs was higher than 10%.
最后剩下90個單細胞進入后續(xù)分析羡疗,這些細胞都超過80%的比對率,而且超過60%的reads是落在外顯子區(qū)域的别洪。
基因過濾:
- 過濾那些RPM小于10的低表達量基因叨恨,因為實在是沒有辦法區(qū)分它們的生物學差異和技術(shù)差異。
數(shù)據(jù)公布在:https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-5661/
把單細胞表達矩陣的基因平均表達量跟Brookes et al.5 的bulk轉(zhuǎn)錄組測序數(shù)據(jù)結(jié)果比較蕉拢,相關性非常好特碳。
分析 Cell-to-cell variation
細胞之間的基因表達差異來自于3大因素:
- tochastic gene expression itself
- technical noise
- confounding expression heterogeneity due to biological processes such as the cell cycle.
作者首先重構(gòu)這些細胞的細胞周期狀態(tài),分析到細胞周期對細胞之間的表達差異貢獻才1.2% 晕换,并且矯正該影響午乓。
為了矯正技術(shù)誤差,作者去除了那些低表達量基因闸准,平均reads數(shù)小于10的那些益愈,最后剩下一萬一多基因。
最后作者用DM (distance to median)來衡量基因在細胞群里的表達變異情況夷家。
DM這個指標非常給力蒸其,超脫了基因長度以及基因表達量的限制,如下:
結(jié)合公共ChIP-seq 數(shù)據(jù)
分析公共數(shù)據(jù) GSE34520 库快,把基因根據(jù) PRC marks and RNAPII states進行分類
- (1) “Active” genes (n?=?4483) without PRC marks (H3K27me3 or H2Aub1) but with active RNAPII (S5pS7pS2p) 這些基因大多數(shù)管家基因摸袁,所以表達量較為穩(wěn)定
- (2) “PRC-active” genes (labeled as “PRCa”; n?=?945) with PRC marks (H3K27me3 or H3K27me3 plus H2Aub1), and active RNAPII.這些基因大多數(shù)信號通路
- (3) “PRCr” genes (n?=?954) have both PRC marks (H3K27me3 and H2Aub1), unproductive RNAPII (S5p only and not recognized by antibody 8WG16) and not expressed in bulk mRNA data by Brookes et al. (bulk mRNA FPKM <1).
經(jīng)由 two-tailed Wilcoxon rank sum 檢驗,發(fā)現(xiàn) “PRC-active” 基因集 統(tǒng)計學顯著的在單細胞水平有著更高的表達變異程度义屏,相比 “Active” genes 靠汁。
數(shù)據(jù)包括:
GSM850467 | RNAPII S5P ChIPSeq |
---|---|
GSM850468 | RNAPII S7P ChIPSeq |
GSM850469 | RNAPII 8WG16 ChIPSeq |
GSM850470 | RNAPII S2P ChIPSeq |
GSM850471 | H2Aub1 ChIPSeq |
GSM850472 | H3K36me3 ChIPSeq |
GSM850473 | Control MockIP |
GSM850474 | Ring1B ChIPSeq |
GSM850475 | RNAPII S5P Repeat ChIPSeq |
GSM850476 | OS25 cells mRNA-Seq |
參考文獻是:Brookes, E. et al. Polycomb associates genome-wide with a specific RNA polymerase II variant, and regulates metabolic genes in ESCs. Cell Stem Cell 10, 157–170 (2012).
作者并沒有公布其peaks文件,所以需要自行去下載raw data走流程闽铐,數(shù)據(jù)在:https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP009883
結(jié)合公共單細胞轉(zhuǎn)錄組數(shù)據(jù)
本文對單細胞轉(zhuǎn)錄組的數(shù)據(jù)處理方方面面都參考自2015年的一篇文章
參考文獻是:Kolodziejczyk, A. A. et al. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17, 471–485 (2015).
結(jié)合Hi-C的3D基因組數(shù)據(jù)
有一個公共數(shù)據(jù):GOTHiC (Genome Organization Through Hi-C) Bioconductor package .
參考文獻:Schoenfelder, S. et al. Polycomb repressive complex PRC1 spatially constrains the mouse embryonic stem cell genome. Nat. Genet. 47, 1179–1186 (2015).
定義某個基因是否是某些組蛋白修飾marker的陽性
Genes were defined as positive for H3K9me3 at their promoter or gene body when an enriched region was overlapping with a 2?kb window around the TSS or between the TSS and TES, respectively.
基因表達差異的衡量
Gene expression variation can be quantified by CV or DM, which is a measure of noise independent of gene expression levels and gene length.
coefficient of variation (CV)
衡量基因在某個細胞群體里面的表達差異蝶怔,這個CV應用最廣泛了,但它被基因長度和基因的表達量影響兄墅。是概率分布離散程度的一個歸一化量度踢星,其定義為標準差 與平均值 之比: 變異系數(shù)(coefficient of variation)只在平均值不為零時有定義,而且一般適用于平均值大于零的情況隙咸。
DM (distance to median)
首先計算 a mean corrected residual of variation by calculating the difference between the observed squared CV (log10-transformed) of a gene and its expected squared CV.
然后 correct for the effect of gene length on the mean corrected residual of variation
這個計算得到的the mean corrected residual of the gene 和 its expected residual 的差異就是 DM
根據(jù) DM排序后可以來定義: top 20% as “noisy” genes and the bottom 20% as “stable” genes.
The expected squared CV or the expected residual was approximated by using a running median.
這個計算公式參考: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4595712/#mmc1
單細胞表達量poisson-beta分布模型
參考自文獻沐悦;Beta-Poisson model for single-cell RNA-seq data analyses
burst frequencies
The beta-Poisson model captures the burst frequency and burst size through the shape and scale parameters α and β, respectively. Large α indicates high burst frequency; large β means large burst size
使用 scLVM 去除細胞周期對表達量的影響
Removing cell cycle variation and technical noise allowed us to focus on stochastic gene expression.
數(shù)據(jù)分析結(jié)果解讀
[圖片上傳失敗...(image-94ae01-1542382067764)]
很明顯,active系列的基因 和PRCa系列基因在各個指標上面有統(tǒng)計學顯著的差異五督。
最后還要一堆實驗驗證所踊,我就懶得看了。
(文章轉(zhuǎn)自jimmy的2018年閱讀文獻筆記)
生信基礎知識大全系列:生信基礎知識100講
史上最強的生信自學環(huán)境準備課來啦8藕伞秕岛! 7次改版,11節(jié)課程,14K的講稿继薛,30個夜晚打磨修壕,100頁PPT的課程。
如果需要組裝自己的服務器遏考;代辦生物信息學服務器
如果需要幫忙下載海外數(shù)據(jù)(GEO/TCGA/GTEx等等)慈鸠,點我?
如果需要線下輔導及培訓灌具,看招學徒
如果需要個人電腦:個人計算機推薦
如果需要置辦生物信息學書籍青团,看:生信人必備書單
如果需要實習崗位:實習職位發(fā)布
如果需要售后:點我
如果需要入門資料大全:點我