hello犬金,大家好,今天再次給大家?guī)?lái)10X單細(xì)胞空間聯(lián)合分析的一個(gè)新方法驹尼,CellDART趣避,有關(guān)10X單細(xì)胞空間聯(lián)合分析的文章呢,其實(shí)分享了不少了新翎,在這里列舉出來(lái)程帕,有需要的可以學(xué)習(xí)一下
10X單細(xì)胞和空間聯(lián)合分析的方法---cell2location
10X空間轉(zhuǎn)錄組和10X單細(xì)胞數(shù)據(jù)聯(lián)合分析方法匯總
10X單細(xì)胞空間聯(lián)合分析之三----Spotlight
10X單細(xì)胞空間聯(lián)合分析之四----DSTG
10X單細(xì)胞空間聯(lián)合分析之五----spatialDWLS
10X單細(xì)胞空間聯(lián)合分析之六(依據(jù)每個(gè)spot的細(xì)胞數(shù)量進(jìn)行單細(xì)胞空間聯(lián)合分析)----Tangram
方法很多,但是一定要有自己的甄別能力地啰,看看哪個(gè)才是適合自己的方法供汛,今天我們分享的方法文獻(xiàn)在CellDART: Cell type inference by domain adaptation of single-cell and spatial transcriptomic data,我們今天來(lái)看看這個(gè)方法有什么特別之處萤皂,適合于什么樣的數(shù)據(jù)分析叽赊,先來(lái)分享文獻(xiàn),最后我們看一下示例代碼顺呕。
Abstract
Deciphering(澄清枫攀,闡明,辨認(rèn)) the cellular composition in genome-wide spatially resolved transcriptomic data is a critical task to clarify the spatial context of cells in a tissue.(這句話翻譯過(guò)來(lái)就是闡明全基因組空間解析的轉(zhuǎn)錄組數(shù)據(jù)中的細(xì)胞組成是闡明組織中細(xì)胞空間背景的關(guān)鍵任務(wù)株茶,這個(gè)確實(shí)非常重要)来涨,作者這里開(kāi)發(fā)了一個(gè)新方法,,which estimates the spatial distribution of cells defined by single-cell level data using domain adaptation of neural networks(這個(gè)東西是什么启盛,需要我們往下看看了) and applied it to the spatial mapping of human lung tissue蹦掐。The neural network that predicts the cell proportion in a pseudospot, a virtual mixture of cells from single-cell data, is translated to decompose the cell types in each spatial barcoded region(這個(gè)是解卷積方法的常規(guī)思路)。下面運(yùn)用這個(gè)軟件分析了兩個(gè)數(shù)據(jù)mouse brain and human dorsolateral prefrontal cortex tissue僵闯,當(dāng)然了卧抗,效果不錯(cuò),老套路了鳖粟。CellDART is expected to help to elucidate the spatial heterogeneity of cells and their close interactions in various tissues.
Main(Introduction)這里我們總結(jié)一下
Breakthrough technologies enabled capturing genome-wide spatial gene expression at a resolution of several cells(10X空間轉(zhuǎn)錄組就是這個(gè)精度) to the single-cell(單細(xì)胞水平的空間轉(zhuǎn)錄組還是個(gè)大難題) and even subcellular levels(亞細(xì)胞水平這個(gè)華大好像研發(fā)成功了)社裆。
Furthermore, emerging computational approaches facilitated the spatiotemporal tracking of specific cells and elucidated cell-to-cell interactions by preserving the spatial context(這個(gè)地方的分析難度相當(dāng)高)。
- 空間轉(zhuǎn)錄組現(xiàn)在唯一的限制因素 一個(gè)spot里面包含了多個(gè)細(xì)胞向图。尤其a tissue with a high level of heterogeneity, such as cancer, consists of a variety of cells in each small domain of the tissue(這個(gè)限制確實(shí)影響很大)泳秀。Thus, the identification of different cell types in each spot is a crucial task to understand the spatial context of pathophysiology using a spatially resolved transcriptome.
現(xiàn)在10X空間轉(zhuǎn)錄組和10X單細(xì)胞聯(lián)合分析的方法主要有兩派标沪,一派是找錨點(diǎn)映射的方法,典型如Seurat嗜傅,scanpy金句,另外一種就是解卷積的方法,典型如SPOTlight吕嘀,cell2location违寞,解卷積的方法占大多數(shù),在解卷積的方法中币他,calculating the proportion of cell types defined by scRNA-seq data from spots of spatially resolved transcriptomic data can be considered a domain adaptation task(區(qū)域適應(yīng)任務(wù)坞靶,這個(gè)翻譯有點(diǎn)土,不過(guò)意思還是解卷積那種思路)蝴悉。A model that predicts cell fractions from the gene expression profile of a group of cells can be transferred to predict the spatial cell-type distribution.(單細(xì)胞空間聯(lián)合分析確實(shí)很重要)彰阴。
In this paper, we suggest a method, CellDART, that implements adversarial discriminative domain adaptation (ADDA)(這個(gè)不知道怎么翻譯,需要看看下文理解一下這個(gè)意思) to infer the cell fraction in spatial transcriptomic data.從scRNA-seq數(shù)據(jù)中隨機(jī)選擇的細(xì)胞構(gòu)成一個(gè)SPOT拍冠,其中細(xì)胞的比例是已知的尿这。從SPOT的基因表達(dá)中提取細(xì)胞成分的神經(jīng)網(wǎng)絡(luò)模型適用于存在空間轉(zhuǎn)錄組數(shù)據(jù)的不同domian。 (神經(jīng)網(wǎng)絡(luò)模型不知道大家了解多少庆杜,涉及到機(jī)器學(xué)習(xí)的知識(shí))射众。Consequently, the joint analysis of spatial and single-cell transcriptomic data elucidates the spatial cell composition and unveils the spatial heterogeneity of the cells,然后運(yùn)用這個(gè)方法來(lái)實(shí)際操作一下晃财。
Result 我們首先來(lái)看看這個(gè)軟件的效果
1叨橱、Decomposition of spatial cell distribution with CellDART in human and mouse brain data
兩個(gè)示例數(shù)據(jù),human dorsolateral prefrontal cortex断盛,mouse brain看一看注釋數(shù)據(jù)
然后是marker gene(感覺(jué)并不是很特異)
The cell clusters showed distinct gene expression patterns represented by cell type-specific marker genes罗洗。
第一步 這里構(gòu)建偽SPOTA specific number of cells (k = 8) were randomly sampled from the single-cell data with random weights to generate pseudospots(number of pseudospots = 20000),(8個(gè)細(xì)胞)钢猛。
第二步 composite gene expression values were computed based on marker genes
第三步 A neural network was trained to accurately decompose the pseudospots, and another network, the domain classifier, was trained to discriminate spots of real spatially resolved transcriptomes from pseudospots.(兩個(gè)訓(xùn)練網(wǎng)絡(luò))
During the training process, the weights of neural networks were updated to predict cell fractions and fool the domain classifier to avoid discriminating spots and pseudospots(這個(gè)地方有點(diǎn)難理解伙菜,大家體會(huì)一下)。As a result, the neural network, source classifier, was trained to estimate cell fractions in both the pseudospots and the real spatial spots as an adversarial domain adaptation process(這里才算理解這個(gè)專(zhuān)用名稱(chēng)干什么的).
到這里我們總結(jié)一下命迈,首先用單細(xì)胞數(shù)據(jù)構(gòu)建偽空間SPOT(這里用到了8個(gè)單細(xì)胞)贩绕,構(gòu)建的偽空間SPOT的細(xì)胞成分是知道的,然后這些偽SPOT與真實(shí)的SPOT構(gòu)建神經(jīng)網(wǎng)絡(luò)壶愤,與此同時(shí)構(gòu)建了細(xì)胞成分的分類(lèi)器淑倾,通過(guò)一定的優(yōu)化,實(shí)現(xiàn)真正的空間數(shù)據(jù)的細(xì)胞比例估計(jì)征椒。
來(lái)看看解卷積效果
這個(gè)結(jié)果很官網(wǎng)的結(jié)果很相似
下面的人的數(shù)據(jù)
示例數(shù)據(jù)看娇哆,還可以。
結(jié)果2 Comparison of CellDART with other integration tools in human brain tissue
- 另外三個(gè)軟件是Scanorama, Cell2location, and RCTD.
首先是Scanorama
Scanorama showed a few excitatory neurons of cortical layer-specific distribution patterns, whereas Ex_2_L5, Ex_4_L6, Ex_9_L5_6, and Ex_10_L2_4 excitatory neurons were distributed differently from the known cortical distribution(Scanorama這個(gè)方法看來(lái)不行啊)。
其次是Cell2location
In the case of Cell2location, neither excitatory neurons nor non-neuronal cells showed layer-specific localization patterns except for a few cell types (Ex_4_L6, Oligos_1, and Micro_Macro)(Cell2location也不行啊)
第三看RCTD
Finally, for RCTD, a few excitatory neurons (Ex_2_L5 and Ex_10_L2_4) exhibited a high cell fraction in the corresponding cortical layer of a known layer specificity; however, other excitatory neurons presented heterogeneous patterns of distribution(這更不行)
作者很賊迂尝,比較的三個(gè)軟件沒(méi)有一個(gè)常用的。
Receiver operating characteristic (ROC) curve analysis was implemented to compare the performance of the four different tools in predicting the layer-specific distribution of excitatory neurons
當(dāng)然剪芥,文章都不用怎么看垄开,作者的軟件效果最好。
結(jié)果3 Discovery of spatial heterogeneity of human lung tissue with CellDART
CellDART was further applied to normal lung spatial transcriptomic data.(正常組織樣本數(shù)據(jù)的運(yùn)用)
The two normal lung tissues were dissected far from the tumor and pathologically confirmed to have no tumor cells
看看聯(lián)合分析的結(jié)果
In both the lung 1 and lung 2 datasets, each cell type showed different distribution patterns across the segmented tissue domains税肪。
In summary, CellDART could precisely localize the spatial distribution of heterogeneous cell types in normal lung tissue.
看看文獻(xiàn)的結(jié)論
In conclusion, CellDART is capable of estimating the spatial cell compositions in complex tissues with high levels of heterogeneity by aligning the domain of single-cell and spatial transcriptomics data. The suggested method may help elucidate the spatial interaction of various cells in close proximity and track the cell-level transcriptomic changes while preserving the spatial context.(反正就是好??)溉躲。
Method 關(guān)注一下算法
CellDART: Cell type inference with domain adaptation
- First, a feature embedder that computes 64-dimensional embedding features from the gene expression data of either spatial spots or pseudospots was defined(首先,定義了一種特征嵌入器益兄,該特征嵌入器根據(jù)空間點(diǎn)或偽點(diǎn)的基因表達(dá)數(shù)據(jù)計(jì)算64維嵌入特征 )锻梳。The feature embedder was comprised of two fully connected layers, each of which underwent batch normalization and activation by the ELU function(標(biāo)準(zhǔn)化和去除批次效應(yīng))。The outputs of the first layer and second layer have 1024 and 64 dimensions, respectively.(兩層的維度還不一樣)净捅。
- Source and domain classifiers were defined such that they could predict the cell fraction in each spot and discriminate pseudospots from spots, respectively.(初始的分類(lèi)器)疑枯。The domain classifier consisted of two fully connected layers. The first layer with 32-dimensional output was connected to the embedded features。The source classifier is directly connected to the embedded features of the feature extractor as a one-layer model connected to the feature embedder. Therefore, the feature extractor attached to either of the classifiers was named a source or domain classification model. The source and domain classification model shared the feature extractor蛔六。(理解上還是比較簡(jiǎn)單的)荆永。
最后,看一下示例代碼
CellDART Example Code: mouse brain
加載模塊
import scanpy as sc
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import da_cellfraction
from utils import random_mix
from sklearn.manifold import TSNE
1. Data load
load scanpy data - 10x datasets
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3
adata_spatial_anterior = sc.datasets.visium_sge(
sample_id="V1_Mouse_Brain_Sagittal_Anterior"
)
adata_spatial_posterior = sc.datasets.visium_sge(
sample_id="V1_Mouse_Brain_Sagittal_Posterior"
)
#Normalize
for adata in [
adata_spatial_anterior,
adata_spatial_posterior,
]:
sc.pp.normalize_total(adata, inplace=True)
Single cell Data: GSE115746
- Download from GEO and use two files "GSE115746_cells_exon_counts.csv" and "GSE115746_complete_metadata_28706-cells.csv"
adata_cortex = sc.read_csv('../data/GSE115746_cells_exon_counts.csv').T
adata_cortex_meta = pd.read_csv('../data/GSE115746_complete_metadata_28706-cells.csv', index_col=0)
adata_cortex_meta_ = adata_cortex_meta.loc[adata_cortex.obs.index,]
adata_cortex.obs = adata_cortex_meta_
adata_cortex.var_names_make_unique()
#Preprocessing
adata_cortex.var['mt'] = adata_cortex.var_names.str.startswith('Mt-') # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(adata_cortex, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
sc.pp.normalize_total(adata_cortex)
#PCA and clustering : Known markers with 'cell_subclass'
sc.tl.pca(adata_cortex, svd_solver='arpack')
sc.pp.neighbors(adata_cortex, n_neighbors=10, n_pcs=40)
sc.tl.umap(adata_cortex)
sc.tl.leiden(adata_cortex, resolution = 0.5)
sc.pl.umap(adata_cortex, color=['leiden','cell_subclass'])
sc.tl.rank_genes_groups(adata_cortex, 'cell_subclass', method='wilcoxon')
sc.pl.rank_genes_groups(adata_cortex, n_genes=20, sharey=False)
Select same gene features
adata_spatial_anterior.var_names_make_unique()
inter_genes = [val for val in res_genes_ if val in adata_spatial_anterior.var.index]
print('Selected Feature Gene number',len(inter_genes))
adata_cortex = adata_cortex[:,inter_genes]
adata_spatial_anterior = adata_spatial_anterior[:,inter_genes]
Array of single cell & spatial data
- Single cell data with labels
- Spatial data without labels
mat_sc = adata_cortex.X
mat_sp = adata_spatial_anterior.X.todense()
df_sc = adata_cortex.obs
lab_sc_sub = df_sc.cell_subclass
sc_sub_dict = dict(zip(range(len(set(lab_sc_sub))), set(lab_sc_sub)))
sc_sub_dict2 = dict((y,x) for x,y in sc_sub_dict.items())
lab_sc_num = [sc_sub_dict2[ii] for ii in lab_sc_sub]
lab_sc_num = np.asarray(lab_sc_num, dtype='int')
2. Generate mixture from single cell data and preprocessing
sc_mix, lab_mix = random_mix(mat_sc, lab_sc_num, nmix=5, n_samples=5000)
def log_minmaxscale(arr):
arrd = len(arr)
arr = np.log1p(arr)
return (arr-np.reshape(np.min(arr,axis=1), (arrd,1)))/np.reshape((np.max(arr, axis=1)-np.min(arr,axis=1)),(arrd,1))
sc_mix_s = log_minmaxscale(sc_mix)
mat_sp_s = log_minmaxscale(mat_sp)
mat_sc_s = log_minmaxscale(mat_sc)
3. Training: Adversarial domain adaptation for cell fraction estimation
Parameters
- alpha: loss weights for adversarial learning for pooling domain classifier
- alpha_lr: learning rate for training domain classifier (alpha_lr *0.001)
- emb_dim: embedding dimension (feature dimension)
- batch_size : batch size for the training
- n_iterations: iteration number of adversarial training
- initial_train: if true, classifier model is trained firstly before adversarial domain adaptation
- initial_train_epochs: number of epochs for inital training
embs, clssmodel = da_cellfraction.train(sc_mix_s, lab_mix, mat_sp_s,
alpha=1, alpha_lr=5, emb_dim = 64, batch_size = 512,
n_iterations = 2000,
initial_train=True,
initial_train_epochs=10)
4. Predict cell fraction of spots and visualization
pred_sp = clssmodel.predict(mat_sp_s)
def plot_cellfraction(visnum):
adata_spatial_anterior.obs['Pred_label'] = pred_sp[:,visnum]
sc.pl.spatial(
adata_spatial_anterior,
img_key="hires",
color='Pred_label',
palette='Set1',
size=1.5,
legend_loc=None,
title = sc_sub_dict[visnum])
numlist = [2,3,7,8,12,13,18]
for num in numlist:
plot_cellfraction(num)
方法上跟解卷積的思路一致国章,不過(guò)引入了新的思想具钥,很值得一試
生活很好,有你更好