今天我們來分享另外一個10X單細胞空間聯(lián)合分析的方法----DSTG(Deconvoluting Spatial Transcriptomics Data),我們在了解這個方法之前忆谓,先對一些基礎(chǔ)的知識進行了解倍谜。
基礎(chǔ)知識
graph convolutional networks (GCN叶撒,圖神經(jīng)網(wǎng)絡(luò))
了解這個概念之前垫毙,先要對CNN(Convolutional Neural Networks河咽,卷積神經(jīng)網(wǎng)絡(luò)),這個我在之前的文章中分享過舌界,大家可以參考文章10X空間轉(zhuǎn)錄組與卷積神經(jīng)網(wǎng)絡(luò)(CNNs),大家可以去看一下掘譬,這里不再多做解釋了。
然后我們來看GCN呻拌,大家參考這篇文章深度學(xué)習(xí)新星 | 圖卷積神經(jīng)網(wǎng)絡(luò)(GCN)有多強大,不關(guān)心算法的可以跳過這部分葱轩。
知道了這個之后,我們來看文章DSTG: Deconvoluting Spatial Transcriptomics Data through Graph-based Artificial Intelligence藐握,該文章目前已發(fā)表靴拱,影響因11分(很高了,而且是中國人寫的)猾普。
文章讀懂并不難缨历,我們這里只關(guān)注重點牲剃。
In this work, we have developed a novel graph-based artificial intelligence model, Deconvoluting Spatial Transcriptomics data through Graph-based convolutional networks(DSTG), for reliable and accurate decomposition of cell mixtures in the spatially resolved transcriptomics data. Based on the well-characterized scRNA-seq dataset(需要定義好的單細胞數(shù)據(jù)), DSTG is able to learn the precise composition of spatial transcriptomics data using semi-supervised graph convolutional network.(圖卷積網(wǎng)絡(luò)解卷積空間數(shù)據(jù))蠢莺。
The performance of DSTG has been validated on synthetic ST data(合成數(shù)據(jù)的驗證), as well as on different experimental ST datasets with well-defined structures including mouse cortex layer, hippocampus tissue, and pancreatic tumor tissues(真實空間數(shù)據(jù)的驗證)坷牛。
首先來看第一點:原理
Our hypothesis is that the captured gene expression on a spot is contributed by a mixture of cells located on that spot.(這里需要注意,也就是說空間的spot是由幾個細胞的混合物)笤成,Our strategy is to use the scRNA-seqderived synthetic spatial transcriptomics data called “pseudo-ST”, to predict cell compositions in real-ST data through semi-supervised learning.(用單細胞數(shù)據(jù)隨機混合幾個細胞來“偽造”空間的數(shù)據(jù)评架,來預(yù)測真實的空間轉(zhuǎn)錄組數(shù)據(jù))。
這個地方需要注意一個問題
如果說單細胞數(shù)據(jù)和空間數(shù)據(jù)不是完全匹配的炕泳,比如說單細胞數(shù)據(jù)缺少或者多了某種細胞類型,這樣的話上祈,預(yù)測的結(jié)果完全是有問題的培遵。
我們來看看步驟:
(1)DSTG constructs the synthetic pseudo-ST data from scRNA-seq data as the learning basis of our method(利用單細胞數(shù)據(jù)隨機幾個細胞的信息合成pseudo-ST data浙芙,這里就需要注意我們上面提到的細胞類型的問題)
(2)DSTG learns a link graph of spot mapping across the pseudo-ST data and real-ST data using shared nearest neighbors. The link graph captures the intrinsic topological similarity between spots and incorporate the pseudo-ST and real-ST data into the same graph for learning.(兩個數(shù)據(jù)之間找鄰居,類似于Seurat的findAnchor)籽腕。
(3)based on the link graph, semi-supervised GCN is used to learn a latent representation of both local graph structure and gene expression patterns that can explain the various cell compositions at spots(GCN尋找最佳的“組分”)
步驟設(shè)計的還是很嚴(yán)謹(jǐn)?shù)奈撕簦褪欠椒ㄉ闲枰芏嗟恼{(diào)整。
方法的advantages
(1)sensitive and efficient皇耗,since for each spot, only the features of similar spots (i.e., neighbor nodes) are used南窗。
(2)acquiring generalizable(可歸納的) knowledge about the association between gene expression patterns and cell compositions across spots in both pseudo- and real-ST, since the weight parameters in the convolution kernel are shared by all spots.
方法的缺點文獻沒有說,但是我們可以總結(jié)一下
(1)數(shù)據(jù)必須匹配
(2)“偽造”的空間數(shù)據(jù)郎楼,要考慮細胞內(nèi)部異型性的問題万伤,對于提取細胞類型的特征來代表這種細胞,其實是有一定的問題的呜袁。從這個角度看敌买,細胞越細分,對聯(lián)合分析越有利阶界,但是對單細胞數(shù)據(jù)分析就會要求很高虹钮。
接下來是一些實例驗證,當(dāng)然膘融,還是老套路芙粱,結(jié)果很好,不然發(fā)不出文章氧映。
接下來看一下軟件的算法:
首先看單細胞數(shù)據(jù)的分析方法
Variable gene selection
For the scRNA-seq data, we first identify genes that exhibit the most variability across different cell types using the analysis of variance (ANOVA). The top 2,000 most variable gene features in the scRNA-seq data are selected according to adjusted P values with Bonferroni correction. Using the scRNA-seq data of the top variable genes, we then generate the pseudo-ST data (這個地方注意宅倒,高變的前2000個基因“偽造”ST data)with synthetic mixtures of cells with known cell 。The gene expressions at each pseudospot of the pseudo-ST data is generated by combining the randomly selected 2 to 8 cells from the scRNA-seq data.compositions.(這個地方就需要注意了屯耸,一種細胞類型其實內(nèi)部也是有異質(zhì)性的拐迁,都是T細胞,高變基因的隨機組合結(jié)果也是千差萬別的)疗绣。For simplicity and illustration, we consistently use the term “spot” to represent the synthetic cell mixture of the pseudo-ST data as well as a spot or a bead of real-ST data线召。
Link graph
這個地方大家需要注意兩點:
(1)這種link的建立,算法在
(2)低維空間數(shù)據(jù)的分析Second, in the low dimension space, we identify the mutual nearest neighbors among spots from pseudo-ST and real-ST data多矮。
算法相對復(fù)雜缓淹,學(xué)數(shù)學(xué)的大牛可以出來解釋一下塔逃。
至于這個方法的代碼在DSTG,代碼就不帶著大家做一做了讯壶,關(guān)鍵在于自己理解這個軟件的用法,以及代碼的參數(shù)湾盗,封腳本很簡單伏蚊,大家自己動手做做就可以了。
生活很好格粪,有你更好