10X單細(xì)胞-10X空間轉(zhuǎn)錄組聯(lián)合分析之七----CellDART

hello犬金,大家好,今天再次給大家?guī)?lái)10X單細(xì)胞空間聯(lián)合分析的一個(gè)新方法驹尼,CellDART趣避,有關(guān)10X單細(xì)胞空間聯(lián)合分析的文章呢,其實(shí)分享了不少了新翎,在這里列舉出來(lái)程帕,有需要的可以學(xué)習(xí)一下

10X單細(xì)胞和空間聯(lián)合分析的方法---cell2location

10X空間轉(zhuǎn)錄組和10X單細(xì)胞數(shù)據(jù)聯(lián)合分析方法匯總

10X單細(xì)胞空間聯(lián)合分析之三----Spotlight

10X單細(xì)胞空間聯(lián)合分析之四----DSTG

10X單細(xì)胞空間聯(lián)合分析之五----spatialDWLS

10X單細(xì)胞空間聯(lián)合分析之六(依據(jù)每個(gè)spot的細(xì)胞數(shù)量進(jìn)行單細(xì)胞空間聯(lián)合分析)----Tangram

方法很多,但是一定要有自己的甄別能力地啰,看看哪個(gè)才是適合自己的方法供汛,今天我們分享的方法文獻(xiàn)在CellDART: Cell type inference by domain adaptation of single-cell and spatial transcriptomic data,我們今天來(lái)看看這個(gè)方法有什么特別之處萤皂,適合于什么樣的數(shù)據(jù)分析叽赊,先來(lái)分享文獻(xiàn),最后我們看一下示例代碼顺呕。

Abstract

Deciphering(澄清枫攀,闡明,辨認(rèn)) the cellular composition in genome-wide spatially resolved transcriptomic data is a critical task to clarify the spatial context of cells in a tissue.(這句話翻譯過(guò)來(lái)就是闡明全基因組空間解析的轉(zhuǎn)錄組數(shù)據(jù)中的細(xì)胞組成是闡明組織中細(xì)胞空間背景的關(guān)鍵任務(wù)株茶,這個(gè)確實(shí)非常重要)来涨,作者這里開(kāi)發(fā)了一個(gè)新方法CellDART,which estimates the spatial distribution of cells defined by single-cell level data using domain adaptation of neural networks(這個(gè)東西是什么启盛,需要我們往下看看了) and applied it to the spatial mapping of human lung tissue蹦掐。The neural network that predicts the cell proportion in a pseudospot, a virtual mixture of cells from single-cell data, is translated to decompose the cell types in each spatial barcoded region(這個(gè)是解卷積方法的常規(guī)思路)。下面運(yùn)用這個(gè)軟件分析了兩個(gè)數(shù)據(jù)mouse brain and human dorsolateral prefrontal cortex tissue僵闯,當(dāng)然了卧抗,效果不錯(cuò),老套路了鳖粟。CellDART is expected to help to elucidate the spatial heterogeneity of cells and their close interactions in various tissues.

Main(Introduction)這里我們總結(jié)一下

Breakthrough technologies enabled capturing genome-wide spatial gene expression at a resolution of several cells(10X空間轉(zhuǎn)錄組就是這個(gè)精度) to the single-cell(單細(xì)胞水平的空間轉(zhuǎn)錄組還是個(gè)大難題) and even subcellular levels(亞細(xì)胞水平這個(gè)華大好像研發(fā)成功了)社裆。
Furthermore, emerging computational approaches facilitated the spatiotemporal tracking of specific cells and elucidated cell-to-cell interactions by preserving the spatial context(這個(gè)地方的分析難度相當(dāng)高)。

  • 空間轉(zhuǎn)錄組現(xiàn)在唯一的限制因素 一個(gè)spot里面包含了多個(gè)細(xì)胞向图。尤其a tissue with a high level of heterogeneity, such as cancer, consists of a variety of cells in each small domain of the tissue(這個(gè)限制確實(shí)影響很大)泳秀。Thus, the identification of different cell types in each spot is a crucial task to understand the spatial context of pathophysiology using a spatially resolved transcriptome.
    現(xiàn)在10X空間轉(zhuǎn)錄組和10X單細(xì)胞聯(lián)合分析的方法主要有兩派标沪,一派是找錨點(diǎn)映射的方法,典型如Seurat嗜傅,scanpy金句,另外一種就是解卷積的方法,典型如SPOTlight吕嘀,cell2location违寞,解卷積的方法占大多數(shù),在解卷積的方法中币他,calculating the proportion of cell types defined by scRNA-seq data from spots of spatially resolved transcriptomic data can be considered a domain adaptation task(區(qū)域適應(yīng)任務(wù)坞靶,這個(gè)翻譯有點(diǎn)土,不過(guò)意思還是解卷積那種思路)蝴悉。A model that predicts cell fractions from the gene expression profile of a group of cells can be transferred to predict the spatial cell-type distribution.(單細(xì)胞空間聯(lián)合分析確實(shí)很重要)彰阴。

In this paper, we suggest a method, CellDART, that implements adversarial discriminative domain adaptation (ADDA)(這個(gè)不知道怎么翻譯,需要看看下文理解一下這個(gè)意思) to infer the cell fraction in spatial transcriptomic data.從scRNA-seq數(shù)據(jù)中隨機(jī)選擇的細(xì)胞構(gòu)成一個(gè)SPOT拍冠,其中細(xì)胞的比例是已知的尿这。從SPOT的基因表達(dá)中提取細(xì)胞成分的神經(jīng)網(wǎng)絡(luò)模型適用于存在空間轉(zhuǎn)錄組數(shù)據(jù)的不同domian。 (神經(jīng)網(wǎng)絡(luò)模型不知道大家了解多少庆杜,涉及到機(jī)器學(xué)習(xí)的知識(shí))射众。Consequently, the joint analysis of spatial and single-cell transcriptomic data elucidates the spatial cell composition and unveils the spatial heterogeneity of the cells,然后運(yùn)用這個(gè)方法來(lái)實(shí)際操作一下晃财。

Result 我們首先來(lái)看看這個(gè)軟件的效果

1叨橱、Decomposition of spatial cell distribution with CellDART in human and mouse brain data

兩個(gè)示例數(shù)據(jù),human dorsolateral prefrontal cortex断盛,mouse brain看一看注釋數(shù)據(jù)

圖片.png

圖片.png

然后是marker gene(感覺(jué)并不是很特異
圖片.png

The cell clusters showed distinct gene expression patterns represented by cell type-specific marker genes罗洗。

第一步 這里構(gòu)建偽SPOTA specific number of cells (k = 8) were randomly sampled from the single-cell data with random weights to generate pseudospots(number of pseudospots = 20000),(8個(gè)細(xì)胞)钢猛。
第二步 composite gene expression values were computed based on marker genes
圖片.png
第三步 A neural network was trained to accurately decompose the pseudospots, and another network, the domain classifier, was trained to discriminate spots of real spatially resolved transcriptomes from pseudospots.(兩個(gè)訓(xùn)練網(wǎng)絡(luò)

During the training process, the weights of neural networks were updated to predict cell fractions and fool the domain classifier to avoid discriminating spots and pseudospots這個(gè)地方有點(diǎn)難理解伙菜,大家體會(huì)一下)。As a result, the neural network, source classifier, was trained to estimate cell fractions in both the pseudospots and the real spatial spots as an adversarial domain adaptation process(這里才算理解這個(gè)專(zhuān)用名稱(chēng)干什么的).

到這里我們總結(jié)一下命迈,首先用單細(xì)胞數(shù)據(jù)構(gòu)建偽空間SPOT(這里用到了8個(gè)單細(xì)胞)贩绕,構(gòu)建的偽空間SPOT的細(xì)胞成分是知道的,然后這些偽SPOT與真實(shí)的SPOT構(gòu)建神經(jīng)網(wǎng)絡(luò)壶愤,與此同時(shí)構(gòu)建了細(xì)胞成分的分類(lèi)器淑倾,通過(guò)一定的優(yōu)化,實(shí)現(xiàn)真正的空間數(shù)據(jù)的細(xì)胞比例估計(jì)征椒。

來(lái)看看解卷積效果

圖片.png

這個(gè)結(jié)果很官網(wǎng)的結(jié)果很相似
下面的人的數(shù)據(jù)
圖片.png

示例數(shù)據(jù)看娇哆,還可以

結(jié)果2 Comparison of CellDART with other integration tools in human brain tissue

  • 另外三個(gè)軟件是Scanorama, Cell2location, and RCTD.
首先是Scanorama

Scanorama showed a few excitatory neurons of cortical layer-specific distribution patterns, whereas Ex_2_L5, Ex_4_L6, Ex_9_L5_6, and Ex_10_L2_4 excitatory neurons were distributed differently from the known cortical distribution(Scanorama這個(gè)方法看來(lái)不行啊)。

圖片.png

其次是Cell2location

In the case of Cell2location, neither excitatory neurons nor non-neuronal cells showed layer-specific localization patterns except for a few cell types (Ex_4_L6, Oligos_1, and Micro_Macro)(Cell2location也不行啊

圖片.png

第三看RCTD

Finally, for RCTD, a few excitatory neurons (Ex_2_L5 and Ex_10_L2_4) exhibited a high cell fraction in the corresponding cortical layer of a known layer specificity; however, other excitatory neurons presented heterogeneous patterns of distribution(這更不行

圖片.png

作者很賊迂尝,比較的三個(gè)軟件沒(méi)有一個(gè)常用的。

Receiver operating characteristic (ROC) curve analysis was implemented to compare the performance of the four different tools in predicting the layer-specific distribution of excitatory neurons

圖片.png

當(dāng)然剪芥,文章都不用怎么看垄开,作者的軟件效果最好

結(jié)果3 Discovery of spatial heterogeneity of human lung tissue with CellDART

CellDART was further applied to normal lung spatial transcriptomic data.(正常組織樣本數(shù)據(jù)的運(yùn)用
The two normal lung tissues were dissected far from the tumor and pathologically confirmed to have no tumor cells

圖片.png

看看聯(lián)合分析的結(jié)果
圖片.png

In both the lung 1 and lung 2 datasets, each cell type showed different distribution patterns across the segmented tissue domains税肪。
圖片.png

圖片.png

In summary, CellDART could precisely localize the spatial distribution of heterogeneous cell types in normal lung tissue.

看看文獻(xiàn)的結(jié)論

In conclusion, CellDART is capable of estimating the spatial cell compositions in complex tissues with high levels of heterogeneity by aligning the domain of single-cell and spatial transcriptomics data. The suggested method may help elucidate the spatial interaction of various cells in close proximity and track the cell-level transcriptomic changes while preserving the spatial context.(反正就是好??)溉躲。

Method 關(guān)注一下算法

CellDART: Cell type inference with domain adaptation

  • First, a feature embedder that computes 64-dimensional embedding features from the gene expression data of either spatial spots or pseudospots was defined(首先,定義了一種特征嵌入器益兄,該特征嵌入器根據(jù)空間點(diǎn)或偽點(diǎn)的基因表達(dá)數(shù)據(jù)計(jì)算64維嵌入特征 )锻梳。The feature embedder was comprised of two fully connected layers, each of which underwent batch normalization and activation by the ELU function(標(biāo)準(zhǔn)化和去除批次效應(yīng))。The outputs of the first layer and second layer have 1024 and 64 dimensions, respectively.(兩層的維度還不一樣)净捅。
  • Source and domain classifiers were defined such that they could predict the cell fraction in each spot and discriminate pseudospots from spots, respectively.(初始的分類(lèi)器)疑枯。The domain classifier consisted of two fully connected layers. The first layer with 32-dimensional output was connected to the embedded features。The source classifier is directly connected to the embedded features of the feature extractor as a one-layer model connected to the feature embedder. Therefore, the feature extractor attached to either of the classifiers was named a source or domain classification model. The source and domain classification model shared the feature extractor蛔六。(理解上還是比較簡(jiǎn)單的)荆永。
圖片.png
圖片.png

最后,看一下示例代碼

CellDART Example Code: mouse brain

加載模塊
import scanpy as sc
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt 
import da_cellfraction
from utils import random_mix
from sklearn.manifold import TSNE

1. Data load

load scanpy data - 10x datasets

sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3

adata_spatial_anterior = sc.datasets.visium_sge(
    sample_id="V1_Mouse_Brain_Sagittal_Anterior"
)
adata_spatial_posterior = sc.datasets.visium_sge(
    sample_id="V1_Mouse_Brain_Sagittal_Posterior"
)

#Normalize
for adata in [
    adata_spatial_anterior,
    adata_spatial_posterior,
]:
    sc.pp.normalize_total(adata, inplace=True)

Single cell Data: GSE115746

  • Download from GEO and use two files "GSE115746_cells_exon_counts.csv" and "GSE115746_complete_metadata_28706-cells.csv"
adata_cortex = sc.read_csv('../data/GSE115746_cells_exon_counts.csv').T
adata_cortex_meta = pd.read_csv('../data/GSE115746_complete_metadata_28706-cells.csv', index_col=0)
adata_cortex_meta_ = adata_cortex_meta.loc[adata_cortex.obs.index,]
adata_cortex.obs = adata_cortex_meta_
adata_cortex.var_names_make_unique()  
#Preprocessing
adata_cortex.var['mt'] = adata_cortex.var_names.str.startswith('Mt-')  # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(adata_cortex, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
sc.pp.normalize_total(adata_cortex)
#PCA and clustering : Known markers with 'cell_subclass'
sc.tl.pca(adata_cortex, svd_solver='arpack')
sc.pp.neighbors(adata_cortex, n_neighbors=10, n_pcs=40)
sc.tl.umap(adata_cortex)
sc.tl.leiden(adata_cortex, resolution = 0.5)
sc.pl.umap(adata_cortex, color=['leiden','cell_subclass'])
圖片.png
sc.tl.rank_genes_groups(adata_cortex, 'cell_subclass', method='wilcoxon')
sc.pl.rank_genes_groups(adata_cortex, n_genes=20, sharey=False)
圖片.png

Select same gene features

adata_spatial_anterior.var_names_make_unique() 
inter_genes = [val for val in res_genes_ if val in adata_spatial_anterior.var.index]
print('Selected Feature Gene number',len(inter_genes))
adata_cortex = adata_cortex[:,inter_genes]

adata_spatial_anterior = adata_spatial_anterior[:,inter_genes]

Array of single cell & spatial data

  • Single cell data with labels
  • Spatial data without labels
mat_sc = adata_cortex.X
mat_sp = adata_spatial_anterior.X.todense()

df_sc = adata_cortex.obs

lab_sc_sub = df_sc.cell_subclass
sc_sub_dict = dict(zip(range(len(set(lab_sc_sub))), set(lab_sc_sub)))
sc_sub_dict2 = dict((y,x) for x,y in sc_sub_dict.items())
lab_sc_num = [sc_sub_dict2[ii] for ii in lab_sc_sub]
lab_sc_num = np.asarray(lab_sc_num, dtype='int')

2. Generate mixture from single cell data and preprocessing

sc_mix, lab_mix = random_mix(mat_sc, lab_sc_num, nmix=5, n_samples=5000)

def log_minmaxscale(arr):
    arrd = len(arr)
    arr = np.log1p(arr)
    return (arr-np.reshape(np.min(arr,axis=1), (arrd,1)))/np.reshape((np.max(arr, axis=1)-np.min(arr,axis=1)),(arrd,1))

sc_mix_s = log_minmaxscale(sc_mix)
mat_sp_s = log_minmaxscale(mat_sp)
mat_sc_s = log_minmaxscale(mat_sc)

3. Training: Adversarial domain adaptation for cell fraction estimation

Parameters

  • alpha: loss weights for adversarial learning for pooling domain classifier
  • alpha_lr: learning rate for training domain classifier (alpha_lr *0.001)
  • emb_dim: embedding dimension (feature dimension)
  • batch_size : batch size for the training
  • n_iterations: iteration number of adversarial training
  • initial_train: if true, classifier model is trained firstly before adversarial domain adaptation
  • initial_train_epochs: number of epochs for inital training
embs, clssmodel = da_cellfraction.train(sc_mix_s, lab_mix, mat_sp_s, 
                                 alpha=1, alpha_lr=5, emb_dim = 64, batch_size = 512,
                                 n_iterations = 2000,
                                  initial_train=True,
                                  initial_train_epochs=10)

4. Predict cell fraction of spots and visualization

pred_sp = clssmodel.predict(mat_sp_s)

def plot_cellfraction(visnum):
    adata_spatial_anterior.obs['Pred_label'] = pred_sp[:,visnum]
    sc.pl.spatial(
        adata_spatial_anterior,
        img_key="hires",
        color='Pred_label',
        palette='Set1',
        size=1.5,
        legend_loc=None,
        title = sc_sub_dict[visnum])

numlist = [2,3,7,8,12,13,18]

for num in numlist:
    plot_cellfraction(num)
圖片.png

圖片.png

方法上跟解卷積的思路一致国章,不過(guò)引入了新的思想具钥,很值得一試

生活很好,有你更好

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末液兽,一起剝皮案震驚了整個(gè)濱河市骂删,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌四啰,老刑警劉巖宁玫,帶你破解...
    沈念sama閱讀 206,013評(píng)論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異拟逮,居然都是意外死亡撬统,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評(píng)論 2 382
  • 文/潘曉璐 我一進(jìn)店門(mén)敦迄,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)恋追,“玉大人,你說(shuō)我怎么就攤上這事罚屋】啻眩” “怎么了?”我有些...
    開(kāi)封第一講書(shū)人閱讀 152,370評(píng)論 0 342
  • 文/不壞的土叔 我叫張陵脾猛,是天一觀的道長(zhǎng)撕彤。 經(jīng)常有香客問(wèn)我,道長(zhǎng),這世上最難降的妖魔是什么羹铅? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 55,168評(píng)論 1 278
  • 正文 為了忘掉前任蚀狰,我火速辦了婚禮,結(jié)果婚禮上职员,老公的妹妹穿的比我還像新娘麻蹋。我一直安慰自己,他們只是感情好焊切,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,153評(píng)論 5 371
  • 文/花漫 我一把揭開(kāi)白布扮授。 她就那樣靜靜地躺著,像睡著了一般专肪。 火紅的嫁衣襯著肌膚如雪刹勃。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書(shū)人閱讀 48,954評(píng)論 1 283
  • 那天嚎尤,我揣著相機(jī)與錄音荔仁,去河邊找鬼。 笑死芽死,一個(gè)胖子當(dāng)著我的面吹牛咕晋,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播收奔,決...
    沈念sama閱讀 38,271評(píng)論 3 399
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼掌呜,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了坪哄?” 一聲冷哼從身側(cè)響起质蕉,我...
    開(kāi)封第一講書(shū)人閱讀 36,916評(píng)論 0 259
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎翩肌,沒(méi)想到半個(gè)月后模暗,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,382評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡念祭,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,877評(píng)論 2 323
  • 正文 我和宋清朗相戀三年兑宇,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片粱坤。...
    茶點(diǎn)故事閱讀 37,989評(píng)論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡隶糕,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出站玄,到底是詐尸還是另有隱情枚驻,我是刑警寧澤,帶...
    沈念sama閱讀 33,624評(píng)論 4 322
  • 正文 年R本政府宣布株旷,位于F島的核電站再登,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜锉矢,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,209評(píng)論 3 307
  • 文/蒙蒙 一梯嗽、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧沽损,春花似錦慷荔、人聲如沸。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 30,199評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)贷岸。三九已至壹士,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間偿警,已是汗流浹背躏救。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 31,418評(píng)論 1 260
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留螟蒸,地道東北人盒使。 一個(gè)月前我還...
    沈念sama閱讀 45,401評(píng)論 2 352
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像七嫌,于是被迫代替她去往敵國(guó)和親少办。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,700評(píng)論 2 345

推薦閱讀更多精彩內(nèi)容