hello皂股,大家好铃绒,今天來分享一個新的分析內(nèi)容,那就是單細(xì)胞空間聯(lián)合分析過程種存在的問題,就是目前大多數(shù)單細(xì)胞空間聯(lián)合分析的軟件绷旗,在沒有明確方法將細(xì)胞分層為離散類型或亞型的環(huán)境中應(yīng)用更具挑戰(zhàn)性癣朗。This is especially important when cells that belong to the same overall type (e.g., T helper cells) may carry different functions and span a continuum of states碌尔。作為解決單細(xì)胞數(shù)據(jù)分析這一基本難題的一種方法苍匆,當(dāng)前的算法可以選擇設(shè)置要分析的數(shù)據(jù)的精度(即,每個廣泛細(xì)胞類型的cluster數(shù)量)函荣。 然而显押,存在一些固有的權(quán)衡:scRNA-seq 數(shù)據(jù)的更深層次聚類提供了更精細(xì)的轉(zhuǎn)錄組分辨率,但使解卷積問題更加困難傻挂,結(jié)果可能不太準(zhǔn)確乘碑,而今天,我們就為了探討這個問題金拒,是不是單細(xì)胞劃分的越精細(xì)兽肤,單細(xì)胞空間聯(lián)合分析的效果越好?殖蚕?空間上能不能表征一種細(xì)胞類型的連續(xù)變化狀態(tài)轿衔??
今天我們參考的文章在Multi-resolution deconvolution of spatial transcriptomics data reveals continuous patterns of inflammation睦疫,主要探討的問題在于如何權(quán)衡單細(xì)胞的分類精度以及單細(xì)胞空間聯(lián)合的準(zhǔn)確性,我們更加希望的是鞭呕,能在空間上準(zhǔn)確表征細(xì)胞類型的不同狀態(tài)蛤育,當(dāng)然,作者也提供了一些借鑒的算法,我們慢慢解析瓦糕。
Abstract
當(dāng)然我們都知道單細(xì)胞空間聯(lián)合是一種非常好的分析手段底洗,能解決我們很多的生物學(xué)問題,但是即使在一種細(xì)胞類型中咕娄,也存在無法明確劃分但反映細(xì)胞功能和與周圍環(huán)境相互作用方式的重要差異的細(xì)胞狀態(tài)的連續(xù)體(也就是亞群及周圍環(huán)境的差異)亥揖,那么多精度分析細(xì)胞的不同狀態(tài)就尤為重要。
Introduction
目前單細(xì)胞空間聯(lián)合分析的方法有NMFReg, RCTD, SPOTLight, Stereoscope, DSTG, and cell2location(這些方法我都分享過圣勒,大家可以回看一下费变,方法各有各的優(yōu)劣勢),這些方法聯(lián)合的過程主要分為兩步圣贸,首先挚歧,從scRNA-seq數(shù)據(jù)推斷出細(xì)胞類型轉(zhuǎn)錄特征; 然后吁峻,使用線性模型估計每個點內(nèi)每種細(xì)胞類型的比例滑负。 這種方法在某些情況下取得了良好的結(jié)果,特別是在分析腦組織切片時用含,其中細(xì)胞組成的多樣性被細(xì)胞類型的離散視圖很好地捕獲矮慕。當(dāng)然,缺點也很明顯啄骇,當(dāng)細(xì)胞類型劃分的過細(xì)(細(xì)胞狀態(tài)區(qū)別出來)凡傅,不僅僅計算上有很大的消耗,而且準(zhǔn)確度也會下降(deeper clustering of the scRNA-seq data provides more granular transcriptomic resolution but makes the deconvolution problem more difficult, and the results potentially less accurate.)肠缔。
這里作者提出了一個新的角度夏跷,使用條件深度生成模型學(xué)習(xí)細(xì)胞類型特異性概況和連續(xù)亞細(xì)胞類型變異(就是每種細(xì)胞類型的特異性和連續(xù)性都要model),并恢復(fù)細(xì)胞類型頻率以及每個點平均轉(zhuǎn)錄狀態(tài)的細(xì)胞類型特異性snapshot明未,當(dāng)然槽华,作者也將方法運用到了具體的案例,我們來解析一下算法和效果趟妥。
Result
算法原理
DestVI(作者集成的軟件)使用兩種不同的潛在變量模型 (LVM) 來描繪細(xì)胞類型比例以及細(xì)胞類型特定的連續(xù)子狀態(tài)猫态。 DestVI的輸入是一對轉(zhuǎn)錄組數(shù)據(jù)集:query空間轉(zhuǎn)錄組數(shù)據(jù)以及來自同一組織的參考 scRNA-seq 數(shù)據(jù)。
- 注:(A)A spatial transcriptomics analysis workflow relies on two data modalities, producing unpaired transcriptomic measurements, in the form of count matrices. The spatial transcriptomics (ST) data measures the gene expression ???? in a given spot ??, and its location λ??. However, each spot may contain multiple cells. The single cell RNA-sequencing data measures the gene expression ???? in a cell ??, but the spatial information is lost because of tissue dissociation. After annotation, we may associate each cell with a cell type ????. These matrices are the input to DestVI, composed of two latent variable models: the???? single-cell latent variable model (scLVM) and the spatial transcriptomics latent variable model (stLVM). DestVI outputs a joint representation of the single-cell data, and the spatial data by estimating the proportion of every cell type in every spot, and projecting the expression of each spot onto cell-type-specific latent spaces. These inferred values may be used for performing downstream analysis such as cell-type-specific differential expression and comparative analyses of conditions.(B) Schematic of the scLVM. RNA counts and cell type information from the single cell RNA-sequencing data are jointly transformed by an encoder neural network into the parameters of the approximate posterior of γ??.a low-dimensional representation of cell-type-specific cell state. Next, a decoder neural network maps samples from the approximate posterior of γ?? along with the cell type information ???? to the parameters of a negative binomial distribution for every gene. Note that we use the superscript notation ???? to denote the ??-th output of the network, that is the ??-th entry ρ???? of the vector ρ??.(C) Schematic of the stLVM. RNA counts from the spatial transcriptomics data are transformed by an encoder neural network into the parameters of the cell-type-specific embeddings γ???? Free parameters β???? encode the abundance of cell type ?? in spot ?? , and may be normalized into cell-type proportions π????.Next, the decoder from the scLVM model maps cell-type-specific embeddings γ???? to estimates of cell-type-specific gene expression. These parameters are averaged across all cell types, weighted by the abundance parameters β????,to approximate the gene expression of the spot with a negative binomial distribution.After training, the decoder may be used to perform cell-type-specific imputation of gene expression across all spots.
DestVI 假設(shè)參考數(shù)據(jù)集中的每個細(xì)胞都用離散的細(xì)胞類型標(biāo)簽進(jìn)行注釋.This spot-level information may then be used for downstream analysis and formulation of biological hypotheses披摄。
這算法寫起來真的是有點麻煩??亲雪。
多軟件之間的比較(當(dāng)然,作者的文章作者軟件效果最好)
首先是運用單細(xì)胞隨即合成模擬“spot”數(shù)據(jù)疚膊,這樣的話我們提前知道每個spot的細(xì)胞類型的比例义辕,當(dāng)然同時也模擬了線性細(xì)胞狀態(tài),To model the continuum of cell states, we construct a linear model for every cell type, with a negative binomial likelihood.
方法比較的結(jié)果如下
綜合來看寓盗,這些結(jié)果表明 DestVI 為離散反卷積算法提供了一種良好的替代方案灌砖,尤其是當(dāng)細(xì)胞類型中存在豐富的連續(xù)轉(zhuǎn)錄變異模式時璧函,就像大多數(shù)生物模型一樣。 具體而言基显,觀察到 DestVI 在基因表達(dá)插補(bǔ)方面表現(xiàn)出穩(wěn)健的性能蘸吓,同時仍能充分估計細(xì)胞類型比例。值得注意的是撩幽,分析僅限于所討論的細(xì)胞類型足夠豐富的點库继。 正如預(yù)期的那樣,DestVI 預(yù)測細(xì)胞類型特異性基因表達(dá)的能力在低頻情況下降低窜醉,然而宪萄,對細(xì)胞類型比例估計的準(zhǔn)確性的影響要小得多。 DestVI can therefore provide an internal control for which spots can be taken into account when conducting a cell-type-specific analysis of gene expression or cell state.
我們來看看示例代碼
Multi-resolution deconvolution of spatial transcriptomics
import sys
#if True, will install via pypi, else will install from source
stable = False
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB and stable:
!pip install --quiet scvi-tools[tutorials]
elif IN_COLAB and not stable:
!pip install --quiet --upgrade jsonschema
!pip install --quiet git+https://github.com/yoseflab/scvi-tools@master#egg=scvi-tools[tutorials]
#!wget --quiet https://github.com/romain-lopez/DestVI-reproducibility/blob/master/lymph_node/deconvolution/ST-LN-compressed.h5ad?raw=true -O ST-LN-compressed.h5ad
#!wget --quiet https://github.com/romain-lopez/DestVI-reproducibility/blob/master/lymph_node/deconvolution/scRNA-LN-compressed.h5ad?raw=true -O scRNA-LN-compressed.h5ad
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from matplotlib.lines import Line2D
import umap
import scvi
from scvi.model import CondSCVI, DestVI
import torch
%matplotlib inline
數(shù)據(jù)前處理
sc_adata = sc.read_h5ad("scRNA-LN-compressed.h5ad")
sc.pl.umap(sc_adata, color="broad_cell_types")
# let us filter some genes
G = 2000
sc.pp.filter_genes(sc_adata, min_counts=10)
sc_adata.layers["counts"] = sc_adata.X.copy()
sc.pp.highly_variable_genes(
sc_adata,
n_top_genes=G,
subset=True,
layer="counts",
flavor="seurat_v3"
)
sc.pp.normalize_total(sc_adata, target_sum=10e4)
sc.pp.log1p(sc_adata)
sc_adata.raw = sc_adata
Now, let’s load the spatial data and choose a common gene subset
st_adata = sc.read_h5ad("ST-LN-compressed.h5ad")
st_adata.layers["counts"] = st_adata.X.copy()
sc.pp.normalize_total(st_adata, target_sum=10e4)
sc.pp.log1p(st_adata)
st_adata.raw = st_adata
# filter genes to be the same on the spatial data
intersect = np.intersect1d(sc_adata.var_names, st_adata.var_names)
st_adata = st_adata[:, intersect].copy()
sc_adata = sc_adata[:, intersect].copy()
G = len(intersect)
sc.pl.embedding(st_adata, basis="location", color="lymph_node")
Fit the scLVM
CondSCVI.setup_anndata(sc_adata, layer="counts", labels_key="broad_cell_types")
sc_model = CondSCVI(sc_adata, weight_obs=True)
sc_model.train(max_epochs=250)
sc_model.history["elbo_train"].plot()
Deconvolution with stLVM
DestVI.setup_anndata(st_adata, layer="counts")
st_model = DestVI.from_rna_model(st_adata, sc_model)
st_model.train(max_epochs=2500)
st_model.history["elbo_train"].plot()
輸出結(jié)果
Cell type proportions
st_adata.obsm["proportions"] = st_model.get_proportions()
st_adata.obsm["proportions"]
ct_list = ["B cells", "CD8 T cells", "Monocytes"]
for ct in ct_list:
data = st_adata.obsm["proportions"][ct].values
st_adata.obs[ct] = np.clip(data, 0, np.quantile(data, 0.99))
sc.pl.embedding(st_adata, basis="location", color=ct_list)
正如預(yù)期的那樣酱虎,觀察到淋巴結(jié)中細(xì)胞類型(B 細(xì)胞/T 細(xì)胞)的強(qiáng)烈區(qū)室化雨膨。 還觀察到單核細(xì)胞的差異定位。
Intra cell type information(重點)
# more globally, the values of the gamma are all summarized in this dictionary of data frames
for ct, g in st_model.get_gamma().items():
st_adata.obsm["{}_gamma".format(ct)] = g
st_adata.obsm["B cells_gamma"].head(5)
Because those values may be hard to examine for end-users, we presented several methods for prioritizing the study of different cell types (based on PCA and Hotspot). If you’d like to use those methods, please refer to our DestVI reproducibility repository. If you have suggestions to improve those, and would like to see them in the main codebase, reach out to us.
In this tutorial, we assume that the user have identified key gene modules that vary within one cell type in the single-cell RNA sequencing data (e.g., using Hotspot). We provide here a code snippet for imputing the spatial pattern of the cell type specific gene expression, using the example of the IFN-I inflammation signal.
plt.figure(figsize=(8, 8))
ct_name = "Monocytes"
gene_name = ["Cxcl9", "Cxcl10", "Fcgr1"]
# we must filter spots with low abundance (consult the paper for an automatic procedure)
indices = np.where(st_adata.obsm["proportions"][ct_name].values > 0.03)[0]
# impute genes and combine them
specific_expression = np.sum(st_model.get_scale_for_ct(ct_name, indices=indices)[gene_name], 1)
specific_expression = np.log(1 + 1e4 * specific_expression)
# plot (i) background (ii) g
plt.scatter(st_adata.obsm["location"][:, 0], st_adata.obsm["location"][:, 1], alpha=0.05)
plt.scatter(st_adata.obsm["location"][indices][:, 0], st_adata.obsm["location"][indices][:, 1],
c=specific_expression, s=10, cmap="Reds")
plt.colorbar()
plt.title(f"Imputation of {gene_name} in {ct_name}")
plt.show()
方法看看就好读串,Seurat還是主流聊记。
生活很好,有你更好