Motivation: Data transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformation on the outcome of unsupervised clustering procedures is still unclear.
Results: Here, we present an Asymmetric Winsorization per-Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications.
數(shù)據(jù)預(yù)處理(歸一化禾乘、標(biāo)準(zhǔn)化)=> 聚類
預(yù)處理:歸一化時(shí)執(zhí)行對(duì)數(shù)轉(zhuǎn)換可以減小異常大值的影響飞醉,但低表達(dá)基因的統(tǒng)計(jì)變異也被放大了璧诵。
聚類:聚類的關(guān)鍵在于特征選擇戳护,而這個(gè)過(guò)程受主觀干擾瓢喉。
預(yù)處理:執(zhí)行兩步統(tǒng)計(jì)轉(zhuǎn)換培他,1) 分樣本歸一化蜂奸;2) 對(duì)歸一化后數(shù)據(jù)作縮尾處理(winsorization)
已知對(duì)每個(gè)樣本愉老,基因表達(dá)量分布右側(cè)近似對(duì)數(shù)正態(tài)分布,以此假定單樣本基因表達(dá)量的眾數(shù)右側(cè)分布滿足啊犬。對(duì)單個(gè)樣本灼擂,依據(jù)表達(dá)log眾數(shù)估計(jì)μ,極大似然法估計(jì)方差σ^2觉至。得到估計(jì)后的z-count剔应。
平滑化:,其中语御,
峻贮,(Φ(z)是z-count的累積分布函數(shù)),
聚類:香農(nóng)熵篩選variable基因:应闯,由于T(x;σ,λ)收斂纤控,可劃分為K個(gè)區(qū)間,pjk是基因j表達(dá)量在所有樣品中第k個(gè)區(qū)間內(nèi)的機(jī)率(Σpjk = 1)碉纺,以此計(jì)算基因j的異質(zhì)性指標(biāo)hj(異質(zhì)性越強(qiáng)hj越大)
Motivation: Recent technological advances enable the profiling of spatial single-cell expression data. Such data present a unique opportunity to study cell–cell interactions and the signaling genes that mediate them. However, most current methods for the analysis of these data focus on unsupervised descriptive modeling, making it hard to identify key signaling genes and quantitatively assess their impact.
Results: We developed aMixture ofExperts forSpatialSignaling genesIdentification (MESSI) method to identify active signaling genes within and between cells. The mixture of experts strategy enables MESSI to subdivide cells into subtypes. MESSI relies on multi-task learning using information from neighboring cells to improve the prediction of response genes within a cell. Applying the methods to three spatial single-cell expression datasets, we show that MESSI accurately predicts the levels of response genes, improving upon prior methods and provides useful biological insights about key signaling genes and subtypes of excitatory neuron cells.
目前船万,透過(guò)scRNA-Seq數(shù)據(jù)推斷胞間互作主要集中于配體受體基因,但是缺少空間信息的前提下很難確信互作的細(xì)胞在空間中有相鄰關(guān)系骨田;同時(shí)對(duì)空間轉(zhuǎn)錄組的分析方法大多關(guān)注于聚類和空間構(gòu)象耿导,少有用空間轉(zhuǎn)錄組數(shù)據(jù)做胞間互作。
本文構(gòu)建了一個(gè)框架(MESSI)态贤,根據(jù)細(xì)胞signaling genes表達(dá)量和空間信息舱呻,將細(xì)胞劃歸亞群,將其劃歸亞群悠汽。透過(guò)多專家模型(Mixture of Experts, MoE)箱吕,輸入受體、配體及其相鄰細(xì)胞配體的基因表達(dá)量介粘,預(yù)測(cè)應(yīng)答基因的表達(dá)量殖氏。
Motivation:Single-cell gene expression distributions measured by single-cell RNA-sequencing (scRNA-seq) often display complex differences between samples. These differences are biologically meaningful but cannot be identified using standard methods for differential expression.
Results: Here, we derive and implement a flexible and fast differential distribution testing procedure based on the 2-Wasserstein distance. Our method is able to detect any type of difference in distribution between conditions. To interpret distributional differences, we decompose the 2-Wasserstein distance into terms that capture the relative contribution of changes in mean, variance and shape to the overall difference. Finally, we derive mathematical generalisations that allow our method to be used in a broad range of disciplines other than scRNA-seq or bioinformatics.
樣本間的單細(xì)胞基因表達(dá)分布有復(fù)雜的差異,但是目前無(wú)法鑒定姻采。為此開(kāi)發(fā)了基于2-Wasserstein的分布差異分析方法雅采。
半?yún)?shù)檢驗(yàn)的方法,柏拉圖分布計(jì)算p值
Abstract
Transposable elements (TEs, 轉(zhuǎn)座子) make up a majority of a typical eukaryote’s genome, and contribute to cell heterogeneity in unclear ways. Single-cell sequencing technologies are powerful tools to explore cells, however analysis is typically gene-centric and TE expression has not been addressed. Here, we develop a single-cell TE processing pipeline, scTE, and report the expression of TEs in single cells in a range of biological contexts. Specific TE types are expressed in subpopulations of embryonic stem cells and are dynamically regulated during pluripotency reprogramming, differentiation, and embryogenesis. Unexpectedly, TEs are expressed in somatic cells, including human disease-specific TEs that are undetectable in bulk analyses. Finally, we apply scTE to single-cell ATAC-seq data, and demonstrate that scTE can discriminate cell type using chromatin accessibly of TEs alone. Overall, our results classify the dynamic patterns of TEs in single cells and their contributions to cell heterogeneity.
Abstract
Recent development of spatial transcriptomics (ST) is capable of associating spatial information at different spots in the tissue section with RNA abundance of cells within each spot, which is particularly important to understand tissue cytoarchitectures and functions. However, for such ST data, since a spot is usually larger than an individual cell, gene expressions measured at each spot are from a mixture of cells with heterogenous cell types. Therefore, ST data at each spot needs to be disentangled so as to reveal the cell compositions at that spatial spot. In this study, we propose a novel method, named deconvoluting spatial transcriptomics data through graph-based convolutional networks (DSTG), to accurately deconvolute the observed gene expressions at each spot and recover its cell constitutions, thus achieving high-level segmentation and revealing spatial architecture of cellular heterogeneity within tissues. DSTG not only demonstrates superior performance on synthetic spatial data generated from different protocols, but also effectively identifies spatial compositions of cells in mouse cortex layer, hippocampus slice and pancreatic tumor tissues. In conclusion, DSTG accurately uncovers the cell states and subpopulations based on spatial localization. DSTG is available as a ready-to-use open source software (https://github.com/Su-informatics-lab/DSTG) for precise interrogation of spatial organizations and functions in tissues.
空間轉(zhuǎn)錄組數(shù)據(jù)的一個(gè)位點(diǎn)往往比單個(gè)細(xì)胞大慨亲,測(cè)序時(shí)會(huì)混入其他細(xì)胞婚瓜。為解決單個(gè)位點(diǎn)細(xì)胞的組成,基于圖卷積網(wǎng)絡(luò)
Abstract
Single-cell RNA sequencing (scRNA-seq) enables the systematic identification of cell populations in a tissue, but characterizing their spatial organization remains challenging. We combine a microarray-based spatial transcriptomics method that reveals spatial patterns of gene expression using an array of spots, each capturing the transcriptomes of multiple adjacent cells, with scRNA-Seq generated from the same sample. To annotate the precise cellular composition of distinct tissue regions, we introduce a method for multimodal intersection analysis. Applying multimodal intersection analysis to primary pancreatic tumors, we find that subpopulations of ductal cells, macrophages, dendritic cells and cancer cells have spatially restricted enrichments, as well as distinct coenrichments with other cell types. Furthermore, we identify colocalization of inflammatory fibroblasts and cancer cells expressing a stress-response gene module. Our approach for mapping the architecture of scRNA-seq-defined subpopulations can be applied to reveal the interactions inherent to complex tissues.