2021-04-14

https://doi.org/10.1093/bioinformatics/btab091

Motivation: Data transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformation on the outcome of unsupervised clustering procedures is still unclear.

Results: Here, we present an Asymmetric Winsorization per-Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications.

數(shù)據(jù)預(yù)處理(歸一化禾乘、標(biāo)準(zhǔn)化)=> 聚類

預(yù)處理:歸一化時(shí)執(zhí)行對(duì)數(shù)轉(zhuǎn)換可以減小異常大值的影響飞醉,但低表達(dá)基因的統(tǒng)計(jì)變異也被放大了璧诵。

聚類:聚類的關(guān)鍵在于特征選擇戳护,而這個(gè)過(guò)程受主觀干擾瓢喉。

預(yù)處理:執(zhí)行兩步統(tǒng)計(jì)轉(zhuǎn)換培他,1) 分樣本歸一化蜂奸;2) 對(duì)歸一化后數(shù)據(jù)作縮尾處理(winsorization)

已知對(duì)每個(gè)樣本愉老,基因表達(dá)量分布右側(cè)近似對(duì)數(shù)正態(tài)分布,以此假定單樣本基因表達(dá)量的眾數(shù)右側(cè)分布滿足X_i=e^{μ_i+σ_iZ}啊犬。對(duì)單個(gè)樣本灼擂,依據(jù)表達(dá)log眾數(shù)估計(jì)μ,極大似然法估計(jì)方差σ^2觉至。得到估計(jì)后的z-count剔应。

平滑化:T(x;σ0,λ)=2\frac{Φ(z/σ(z;σ0,λ))?c}{c},x∈R,其中语御,σ(z;σ0,λ)=σ0?{1+2?λ?(Φ(z)?0.5)?I(z>0)}峻贮,(Φ(z)是z-count的累積分布函數(shù)),

聚類:香農(nóng)熵篩選variable基因:h_j=?\frac{1}{?log?K}∑^K_{k=1}p_{jk}?log?p_{jk}应闯,由于T(x;σ,λ)收斂纤控,可劃分為K個(gè)區(qū)間,pjk是基因j表達(dá)量在所有樣品中第k個(gè)區(qū)間內(nèi)的機(jī)率(Σpjk = 1)碉纺,以此計(jì)算基因j的異質(zhì)性指標(biāo)hj(異質(zhì)性越強(qiáng)hj越大)

https://doi.org/10.1093/bioinformatics/btaa769

Motivation: Recent technological advances enable the profiling of spatial single-cell expression data. Such data present a unique opportunity to study cell–cell interactions and the signaling genes that mediate them. However, most current methods for the analysis of these data focus on unsupervised descriptive modeling, making it hard to identify key signaling genes and quantitatively assess their impact.

Results: We developed aMixture ofExperts forSpatialSignaling genesIdentification (MESSI) method to identify active signaling genes within and between cells. The mixture of experts strategy enables MESSI to subdivide cells into subtypes. MESSI relies on multi-task learning using information from neighboring cells to improve the prediction of response genes within a cell. Applying the methods to three spatial single-cell expression datasets, we show that MESSI accurately predicts the levels of response genes, improving upon prior methods and provides useful biological insights about key signaling genes and subtypes of excitatory neuron cells.

目前船万,透過(guò)scRNA-Seq數(shù)據(jù)推斷胞間互作主要集中于配體受體基因,但是缺少空間信息的前提下很難確信互作的細(xì)胞在空間中有相鄰關(guān)系骨田;同時(shí)對(duì)空間轉(zhuǎn)錄組的分析方法大多關(guān)注于聚類和空間構(gòu)象耿导,少有用空間轉(zhuǎn)錄組數(shù)據(jù)做胞間互作。


本文構(gòu)建了一個(gè)框架(MESSI)态贤,根據(jù)細(xì)胞signaling genes表達(dá)量和空間信息舱呻,將細(xì)胞劃歸亞群,將其劃歸亞群悠汽。透過(guò)多專家模型(Mixture of Experts, MoE)箱吕,輸入受體、配體及其相鄰細(xì)胞配體的基因表達(dá)量介粘,預(yù)測(cè)應(yīng)答基因的表達(dá)量殖氏。

Motivation:Single-cell gene expression distributions measured by single-cell RNA-sequencing (scRNA-seq) often display complex differences between samples. These differences are biologically meaningful but cannot be identified using standard methods for differential expression.

Results: Here, we derive and implement a flexible and fast differential distribution testing procedure based on the 2-Wasserstein distance. Our method is able to detect any type of difference in distribution between conditions. To interpret distributional differences, we decompose the 2-Wasserstein distance into terms that capture the relative contribution of changes in mean, variance and shape to the overall difference. Finally, we derive mathematical generalisations that allow our method to be used in a broad range of disciplines other than scRNA-seq or bioinformatics.

樣本間的單細(xì)胞基因表達(dá)分布有復(fù)雜的差異,但是目前無(wú)法鑒定姻采。為此開(kāi)發(fā)了基于2-Wasserstein的分布差異分析方法雅采。

半?yún)?shù)檢驗(yàn)的方法,柏拉圖分布計(jì)算p值


https://doi.org/10.1038/s41467-021-21808-x

Abstract

Transposable elements (TEs, 轉(zhuǎn)座子) make up a majority of a typical eukaryote’s genome, and contribute to cell heterogeneity in unclear ways. Single-cell sequencing technologies are powerful tools to explore cells, however analysis is typically gene-centric and TE expression has not been addressed. Here, we develop a single-cell TE processing pipeline, scTE, and report the expression of TEs in single cells in a range of biological contexts. Specific TE types are expressed in subpopulations of embryonic stem cells and are dynamically regulated during pluripotency reprogramming, differentiation, and embryogenesis. Unexpectedly, TEs are expressed in somatic cells, including human disease-specific TEs that are undetectable in bulk analyses. Finally, we apply scTE to single-cell ATAC-seq data, and demonstrate that scTE can discriminate cell type using chromatin accessibly of TEs alone. Overall, our results classify the dynamic patterns of TEs in single cells and their contributions to cell heterogeneity.


https://doi.org/10.1093/bib/bbaa414

Abstract

Recent development of spatial transcriptomics (ST) is capable of associating spatial information at different spots in the tissue section with RNA abundance of cells within each spot, which is particularly important to understand tissue cytoarchitectures and functions. However, for such ST data, since a spot is usually larger than an individual cell, gene expressions measured at each spot are from a mixture of cells with heterogenous cell types. Therefore, ST data at each spot needs to be disentangled so as to reveal the cell compositions at that spatial spot. In this study, we propose a novel method, named deconvoluting spatial transcriptomics data through graph-based convolutional networks (DSTG), to accurately deconvolute the observed gene expressions at each spot and recover its cell constitutions, thus achieving high-level segmentation and revealing spatial architecture of cellular heterogeneity within tissues. DSTG not only demonstrates superior performance on synthetic spatial data generated from different protocols, but also effectively identifies spatial compositions of cells in mouse cortex layer, hippocampus slice and pancreatic tumor tissues. In conclusion, DSTG accurately uncovers the cell states and subpopulations based on spatial localization. DSTG is available as a ready-to-use open source software (https://github.com/Su-informatics-lab/DSTG) for precise interrogation of spatial organizations and functions in tissues.

空間轉(zhuǎn)錄組數(shù)據(jù)的一個(gè)位點(diǎn)往往比單個(gè)細(xì)胞大慨亲,測(cè)序時(shí)會(huì)混入其他細(xì)胞婚瓜。為解決單個(gè)位點(diǎn)細(xì)胞的組成,基于圖卷積網(wǎng)絡(luò)



https://doi.org/10.1038/s41587-019-0392-8

Abstract

Single-cell RNA sequencing (scRNA-seq) enables the systematic identification of cell populations in a tissue, but characterizing their spatial organization remains challenging. We combine a microarray-based spatial transcriptomics method that reveals spatial patterns of gene expression using an array of spots, each capturing the transcriptomes of multiple adjacent cells, with scRNA-Seq generated from the same sample. To annotate the precise cellular composition of distinct tissue regions, we introduce a method for multimodal intersection analysis. Applying multimodal intersection analysis to primary pancreatic tumors, we find that subpopulations of ductal cells, macrophages, dendritic cells and cancer cells have spatially restricted enrichments, as well as distinct coenrichments with other cell types. Furthermore, we identify colocalization of inflammatory fibroblasts and cancer cells expressing a stress-response gene module. Our approach for mapping the architecture of scRNA-seq-defined subpopulations can be applied to reveal the interactions inherent to complex tissues.


?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末刑棵,一起剝皮案震驚了整個(gè)濱河市巴刻,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌蛉签,老刑警劉巖胡陪,帶你破解...
    沈念sama閱讀 222,104評(píng)論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件沥寥,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡柠座,警方通過(guò)查閱死者的電腦和手機(jī)邑雅,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,816評(píng)論 3 399
  • 文/潘曉璐 我一進(jìn)店門(mén),熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)妈经,“玉大人淮野,你說(shuō)我怎么就攤上這事〈蹬荩” “怎么了骤星?”我有些...
    開(kāi)封第一講書(shū)人閱讀 168,697評(píng)論 0 360
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)爆哑。 經(jīng)常有香客問(wèn)我洞难,道長(zhǎng),這世上最難降的妖魔是什么揭朝? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 59,836評(píng)論 1 298
  • 正文 為了忘掉前任廊营,我火速辦了婚禮,結(jié)果婚禮上萝勤,老公的妹妹穿的比我還像新娘。我一直安慰自己呐伞,他們只是感情好敌卓,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,851評(píng)論 6 397
  • 文/花漫 我一把揭開(kāi)白布。 她就那樣靜靜地躺著伶氢,像睡著了一般趟径。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上癣防,一...
    開(kāi)封第一講書(shū)人閱讀 52,441評(píng)論 1 310
  • 那天蜗巧,我揣著相機(jī)與錄音,去河邊找鬼蕾盯。 笑死幕屹,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的级遭。 我是一名探鬼主播望拖,決...
    沈念sama閱讀 40,992評(píng)論 3 421
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼挫鸽!你這毒婦竟也來(lái)了说敏?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 39,899評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤丢郊,失蹤者是張志新(化名)和其女友劉穎盔沫,沒(méi)想到半個(gè)月后医咨,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 46,457評(píng)論 1 318
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡架诞,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,529評(píng)論 3 341
  • 正文 我和宋清朗相戀三年拟淮,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片侈贷。...
    茶點(diǎn)故事閱讀 40,664評(píng)論 1 352
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡惩歉,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出俏蛮,到底是詐尸還是另有隱情撑蚌,我是刑警寧澤,帶...
    沈念sama閱讀 36,346評(píng)論 5 350
  • 正文 年R本政府宣布搏屑,位于F島的核電站争涌,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏辣恋。R本人自食惡果不足惜亮垫,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 42,025評(píng)論 3 334
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望伟骨。 院中可真熱鬧饮潦,春花似錦、人聲如沸携狭。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 32,511評(píng)論 0 24
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)逛腿。三九已至稀并,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間单默,已是汗流浹背碘举。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 33,611評(píng)論 1 272
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留搁廓,地道東北人引颈。 一個(gè)月前我還...
    沈念sama閱讀 49,081評(píng)論 3 377
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像枚抵,于是被迫代替她去往敵國(guó)和親线欲。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,675評(píng)論 2 359

推薦閱讀更多精彩內(nèi)容