10X單細(xì)胞 & 10X空間轉(zhuǎn)錄組聯(lián)合分析之細(xì)胞連續(xù)變化狀態(tài)（scvi-tools）

hello皂股，大家好铃绒，今天來分享一個新的分析內(nèi)容，那就是單細(xì)胞空間聯(lián)合分析過程種存在的問題，就是目前大多數(shù)單細(xì)胞空間聯(lián)合分析的軟件绷旗，在沒有明確方法將細(xì)胞分層為離散類型或亞型的環(huán)境中應(yīng)用更具挑戰(zhàn)性癣朗。This is especially important when cells that belong to the same overall type (e.g., T helper cells) may carry different functions and span a continuum of states碌尔。作為解決單細(xì)胞數(shù)據(jù)分析這一基本難題的一種方法苍匆，當(dāng)前的算法可以選擇設(shè)置要分析的數(shù)據(jù)的精度（即，每個廣泛細(xì)胞類型的cluster數(shù)量）函荣。然而显押，存在一些固有的權(quán)衡：scRNA-seq 數(shù)據(jù)的更深層次聚類提供了更精細(xì)的轉(zhuǎn)錄組分辨率，但使解卷積問題更加困難傻挂，結(jié)果可能不太準(zhǔn)確乘碑，而今天，我們就為了探討這個問題金拒，是不是單細(xì)胞劃分的越精細(xì)兽肤，單細(xì)胞空間聯(lián)合分析的效果越好？殖蚕？空間上能不能表征一種細(xì)胞類型的連續(xù)變化狀態(tài)轿衔？？

今天我們參考的文章在Multi-resolution deconvolution of spatial transcriptomics data reveals continuous patterns of inflammation睦疫，主要探討的問題在于如何權(quán)衡單細(xì)胞的分類精度以及單細(xì)胞空間聯(lián)合的準(zhǔn)確性，我們更加希望的是鞭呕，能在空間上準(zhǔn)確表征細(xì)胞類型的不同狀態(tài)蛤育，當(dāng)然，作者也提供了一些借鑒的算法，我們慢慢解析瓦糕。

圖片.png

Abstract

當(dāng)然我們都知道單細(xì)胞空間聯(lián)合是一種非常好的分析手段底洗，能解決我們很多的生物學(xué)問題，但是即使在一種細(xì)胞類型中咕娄，也存在無法明確劃分但反映細(xì)胞功能和與周圍環(huán)境相互作用方式的重要差異的細(xì)胞狀態(tài)的連續(xù)體（也就是亞群及周圍環(huán)境的差異）亥揖，那么多精度分析細(xì)胞的不同狀態(tài)就尤為重要。

Introduction

目前單細(xì)胞空間聯(lián)合分析的方法有NMFReg, RCTD, SPOTLight, Stereoscope, DSTG, and cell2location（這些方法我都分享過圣勒，大家可以回看一下费变，方法各有各的優(yōu)劣勢），這些方法聯(lián)合的過程主要分為兩步圣贸，首先挚歧，從scRNA-seq數(shù)據(jù)推斷出細(xì)胞類型轉(zhuǎn)錄特征；然后吁峻，使用線性模型估計每個點內(nèi)每種細(xì)胞類型的比例滑负。這種方法在某些情況下取得了良好的結(jié)果，特別是在分析腦組織切片時用含，其中細(xì)胞組成的多樣性被細(xì)胞類型的離散視圖很好地捕獲矮慕。當(dāng)然，缺點也很明顯啄骇，當(dāng)細(xì)胞類型劃分的過細(xì)（細(xì)胞狀態(tài)區(qū)別出來）凡傅，不僅僅計算上有很大的消耗，而且準(zhǔn)確度也會下降（deeper clustering of the scRNA-seq data provides more granular transcriptomic resolution but makes the deconvolution problem more difficult, and the results potentially less accurate.）肠缔。

這里作者提出了一個新的角度夏跷，使用條件深度生成模型學(xué)習(xí)細(xì)胞類型特異性概況和連續(xù)亞細(xì)胞類型變異（就是每種細(xì)胞類型的特異性和連續(xù)性都要model），并恢復(fù)細(xì)胞類型頻率以及每個點平均轉(zhuǎn)錄狀態(tài)的細(xì)胞類型特異性snapshot明未，當(dāng)然槽华，作者也將方法運用到了具體的案例，我們來解析一下算法和效果趟妥。

Result

算法原理

DestVI（作者集成的軟件）使用兩種不同的潛在變量模型 (LVM) 來描繪細(xì)胞類型比例以及細(xì)胞類型特定的連續(xù)子狀態(tài)猫态。 DestVI的輸入是一對轉(zhuǎn)錄組數(shù)據(jù)集：query空間轉(zhuǎn)錄組數(shù)據(jù)以及來自同一組織的參考 scRNA-seq 數(shù)據(jù)。

圖片.png

注：（A）A spatial transcriptomics analysis workflow relies on two data modalities, producing unpaired transcriptomic measurements, in the form of count matrices. The spatial transcriptomics (ST) data measures the gene expression ??_?? in a given spot ??, and its location λ_??. However, each spot may contain multiple cells. The single cell RNA-sequencing data measures the gene expression ??_?? in a cell ??, but the spatial information is lost because of tissue dissociation. After annotation, we may associate each cell with a cell type ??_??. These matrices are the input to DestVI, composed of two latent variable models: the???? single-cell latent variable model (scLVM) and the spatial transcriptomics latent variable model (stLVM). DestVI outputs a joint representation of the single-cell data, and the spatial data by estimating the proportion of every cell type in every spot, and projecting the expression of each spot onto cell-type-specific latent spaces. These inferred values may be used for performing downstream analysis such as cell-type-specific differential expression and comparative analyses of conditions.(B) Schematic of the scLVM. RNA counts and cell type information from the single cell RNA-sequencing data are jointly transformed by an encoder neural network into the parameters of the approximate posterior of γ_??.a low-dimensional representation of cell-type-specific cell state. Next, a decoder neural network maps samples from the approximate posterior of γ_?? along with the cell type information ??_?? to the parameters of a negative binomial distribution for every gene. Note that we use the superscript notation ??^?? to denote the ??-th output of the network, that is the ??-th entry ρ_???? of the vector ρ_??.(C) Schematic of the stLVM. RNA counts from the spatial transcriptomics data are transformed by an encoder neural network into the parameters of the cell-type-specific embeddings γ_??^?? Free parameters β_??^?? encode the abundance of cell type ?? in spot ?? , and may be normalized into cell-type proportions π_??^??.Next, the decoder from the scLVM model maps cell-type-specific embeddings γ_??^?? to estimates of cell-type-specific gene expression. These parameters are averaged across all cell types, weighted by the abundance parameters β_??^??,to approximate the gene expression of the spot with a negative binomial distribution.After training, the decoder may be used to perform cell-type-specific imputation of gene expression across all spots.

DestVI 假設(shè)參考數(shù)據(jù)集中的每個細(xì)胞都用離散的細(xì)胞類型標(biāo)簽進(jìn)行注釋.This spot-level information may then be used for downstream analysis and formulation of biological hypotheses披摄。

圖片.png

這算法寫起來真的是有點麻煩??亲雪。

多軟件之間的比較（當(dāng)然，作者的文章作者軟件效果最好）

首先是運用單細(xì)胞隨即合成模擬“spot”數(shù)據(jù)疚膊，這樣的話我們提前知道每個spot的細(xì)胞類型的比例义辕，當(dāng)然同時也模擬了線性細(xì)胞狀態(tài)，To model the continuum of cell states, we construct a linear model for every cell type, with a negative binomial likelihood.

圖片.png

方法比較的結(jié)果如下

圖片.png

綜合來看寓盗，這些結(jié)果表明 DestVI 為離散反卷積算法提供了一種良好的替代方案灌砖，尤其是當(dāng)細(xì)胞類型中存在豐富的連續(xù)轉(zhuǎn)錄變異模式時璧函，就像大多數(shù)生物模型一樣。具體而言基显，觀察到 DestVI 在基因表達(dá)插補(bǔ)方面表現(xiàn)出穩(wěn)健的性能蘸吓，同時仍能充分估計細(xì)胞類型比例。值得注意的是撩幽，分析僅限于所討論的細(xì)胞類型足夠豐富的點库继。正如預(yù)期的那樣，DestVI 預(yù)測細(xì)胞類型特異性基因表達(dá)的能力在低頻情況下降低窜醉，然而宪萄，對細(xì)胞類型比例估計的準(zhǔn)確性的影響要小得多。 DestVI can therefore provide an internal control for which spots can be taken into account when conducting a cell-type-specific analysis of gene expression or cell state.

我們來看看示例代碼

Multi-resolution deconvolution of spatial transcriptomics

import sys

#if True, will install via pypi, else will install from source
stable = False
IN_COLAB = "google.colab" in sys.modules

if IN_COLAB and stable:
    !pip install --quiet scvi-tools[tutorials]
elif IN_COLAB and not stable:
    !pip install --quiet --upgrade jsonschema
    !pip install --quiet git+https://github.com/yoseflab/scvi-tools@master#egg=scvi-tools[tutorials]
#!wget --quiet https://github.com/romain-lopez/DestVI-reproducibility/blob/master/lymph_node/deconvolution/ST-LN-compressed.h5ad?raw=true -O ST-LN-compressed.h5ad
#!wget --quiet https://github.com/romain-lopez/DestVI-reproducibility/blob/master/lymph_node/deconvolution/scRNA-LN-compressed.h5ad?raw=true -O scRNA-LN-compressed.h5ad
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from matplotlib.lines import Line2D
import umap

import scvi
from scvi.model import CondSCVI, DestVI

import torch

%matplotlib inline

數(shù)據(jù)前處理

sc_adata = sc.read_h5ad("scRNA-LN-compressed.h5ad")
sc.pl.umap(sc_adata, color="broad_cell_types")

圖片.png

# let us filter some genes
G = 2000
sc.pp.filter_genes(sc_adata, min_counts=10)

sc_adata.layers["counts"] = sc_adata.X.copy()

sc.pp.highly_variable_genes(
    sc_adata,
    n_top_genes=G,
    subset=True,
    layer="counts",
    flavor="seurat_v3"
)

sc.pp.normalize_total(sc_adata, target_sum=10e4)
sc.pp.log1p(sc_adata)
sc_adata.raw = sc_adata
Now, let’s load the spatial data and choose a common gene subset

st_adata = sc.read_h5ad("ST-LN-compressed.h5ad")

st_adata.layers["counts"] = st_adata.X.copy()

sc.pp.normalize_total(st_adata, target_sum=10e4)
sc.pp.log1p(st_adata)
st_adata.raw = st_adata

# filter genes to be the same on the spatial data
intersect = np.intersect1d(sc_adata.var_names, st_adata.var_names)
st_adata = st_adata[:, intersect].copy()
sc_adata = sc_adata[:, intersect].copy()
G = len(intersect)

sc.pl.embedding(st_adata, basis="location", color="lymph_node")

圖片.png

Fit the scLVM

CondSCVI.setup_anndata(sc_adata, layer="counts", labels_key="broad_cell_types")
sc_model = CondSCVI(sc_adata, weight_obs=True)
sc_model.train(max_epochs=250)
sc_model.history["elbo_train"].plot()

圖片.png

Deconvolution with stLVM

DestVI.setup_anndata(st_adata, layer="counts")
st_model = DestVI.from_rna_model(st_adata, sc_model)
st_model.train(max_epochs=2500)
st_model.history["elbo_train"].plot()

圖片.png

輸出結(jié)果

Cell type proportions

st_adata.obsm["proportions"] = st_model.get_proportions()
st_adata.obsm["proportions"]

圖片.png

ct_list = ["B cells", "CD8 T cells", "Monocytes"]
for ct in ct_list:
  data = st_adata.obsm["proportions"][ct].values
  st_adata.obs[ct] = np.clip(data, 0, np.quantile(data, 0.99))
sc.pl.embedding(st_adata, basis="location", color=ct_list)

圖片.png

正如預(yù)期的那樣酱虎，觀察到淋巴結(jié)中細(xì)胞類型（B 細(xì)胞/T 細(xì)胞）的強(qiáng)烈區(qū)室化雨膨。還觀察到單核細(xì)胞的差異定位。

Intra cell type information（重點）

# more globally, the values of the gamma are all summarized in this dictionary of data frames
for ct, g in st_model.get_gamma().items():
  st_adata.obsm["{}_gamma".format(ct)] = g
st_adata.obsm["B cells_gamma"].head(5)

圖片.png

Because those values may be hard to examine for end-users, we presented several methods for prioritizing the study of different cell types (based on PCA and Hotspot). If you’d like to use those methods, please refer to our DestVI reproducibility repository. If you have suggestions to improve those, and would like to see them in the main codebase, reach out to us.

In this tutorial, we assume that the user have identified key gene modules that vary within one cell type in the single-cell RNA sequencing data (e.g., using Hotspot). We provide here a code snippet for imputing the spatial pattern of the cell type specific gene expression, using the example of the IFN-I inflammation signal.

plt.figure(figsize=(8, 8))

ct_name = "Monocytes"
gene_name = ["Cxcl9", "Cxcl10", "Fcgr1"]


# we must filter spots with low abundance (consult the paper for an automatic procedure)
indices = np.where(st_adata.obsm["proportions"][ct_name].values > 0.03)[0]

# impute genes and combine them
specific_expression = np.sum(st_model.get_scale_for_ct(ct_name, indices=indices)[gene_name], 1)
specific_expression = np.log(1 + 1e4 * specific_expression)

# plot (i) background (ii) g
plt.scatter(st_adata.obsm["location"][:, 0], st_adata.obsm["location"][:, 1], alpha=0.05)
plt.scatter(st_adata.obsm["location"][indices][:, 0], st_adata.obsm["location"][indices][:, 1],
            c=specific_expression, s=10, cmap="Reds")
plt.colorbar()
plt.title(f"Imputation of {gene_name} in {ct_name}")
plt.show()

圖片.png

方法看看就好读串，Seurat還是主流聊记。

生活很好，有你更好

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

禁止轉(zhuǎn)載恢暖，如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者排监。

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市杰捂，隨后出現(xiàn)的幾起案子舆床，更是在濱河造成了極大的恐慌，老刑警劉巖嫁佳，帶你破解...
沈念sama閱讀 206,013評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件挨队，死亡現(xiàn)場離奇詭異，居然都是意外死亡蒿往，警方通過查閱死者的電腦和手機(jī)盛垦，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,205評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來瓤漏，“玉大人腾夯，你說我怎么就攤上這事∈叱洌” “怎么了蝶俱？”我有些...
開封第一講書人閱讀 152,370評論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長饥漫。經(jīng)常有香客問我榨呆，道長，這世上最難降的妖魔是什么趾浅？我笑而不...
開封第一講書人閱讀 55,168評論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任愕提，我火速辦了婚禮馒稍，結(jié)果婚禮上皿哨，老公的妹妹穿的比我還像新娘浅侨。我一直安慰自己，他們只是感情好证膨，可當(dāng)我...
茶點故事閱讀 64,153評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布如输。她就那樣靜靜地躺著，像睡著了一般央勒。火紅的嫁衣襯著肌膚如雪不见。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 48,954評論 1贊 283
城市分裂傳說
那天崔步，我揣著相機(jī)與錄音稳吮，去河邊找鬼。笑死井濒，一個胖子當(dāng)著我的面吹牛灶似，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播瑞你，決...
沈念sama閱讀 38,271評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼酪惭，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了者甲？” 一聲冷哼從身側(cè)響起春感，我...
開封第一講書人閱讀 36,916評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎虏缸，沒想到半個月后鲫懒，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,382評論 1贊 300
?護(hù)林員之死
正文獨居荒郊野嶺守林人離奇死亡刽辙，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 35,877評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年窥岩，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片扫倡。...
茶點故事閱讀 37,989評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡谦秧，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出撵溃，到底是詐尸還是另有隱情疚鲤，我是刑警寧澤，帶...
沈念sama閱讀 33,624評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布缘挑，位于F島的核電站集歇，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏语淘。R本人自食惡果不足惜诲宇，卻給世界環(huán)境...
茶點故事閱讀 39,209評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一际歼、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧姑蓝，春花似錦鹅心、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,199評論 0贊 19
一樁弒父案旭愧，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至宙暇，卻和暖如春输枯，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背占贫。一陣腳步聲響...
開封第一講書人閱讀 31,418評論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工桃熄，沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人型奥。一個月前我還...
沈念sama閱讀 45,401評論 2贊 352
代替公主和親
正文我出身青樓瞳收，卻偏偏與公主長得像，于是被迫代替她去往敵國和親桩引。傳聞我的和親對象是個殘疾皇子缎讼，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 42,700評論 2贊 345

10X單細(xì)胞 & 10X空間轉(zhuǎn)錄組聯(lián)合分析之細(xì)胞連續(xù)變化狀態(tài)（scvi-tools）

Abstract

Introduction

Result

算法原理

DestVI 假設(shè)參考數(shù)據(jù)集中的每個細(xì)胞都用離散的細(xì)胞類型標(biāo)簽進(jìn)行注釋.This spot-level information may then be used for downstream analysis and formulation of biological hypotheses披摄。

這算法寫起來真的是有點麻煩??亲雪。

多軟件之間的比較（當(dāng)然，作者的文章作者軟件效果最好）

方法比較的結(jié)果如下

我們來看看示例代碼

Multi-resolution deconvolution of spatial transcriptomics

數(shù)據(jù)前處理

Fit the scLVM

Deconvolution with stLVM

輸出結(jié)果

Cell type proportions

正如預(yù)期的那樣酱虎，觀察到淋巴結(jié)中細(xì)胞類型（B 細(xì)胞/T 細(xì)胞）的強(qiáng)烈區(qū)室化雨膨。 還觀察到單核細(xì)胞的差異定位。

Intra cell type information（重點）

方法看看就好读串，Seurat還是主流聊记。

推薦閱讀更多精彩內(nèi)容

正如預(yù)期的那樣酱虎，觀察到淋巴結(jié)中細(xì)胞類型（B 細(xì)胞/T 細(xì)胞）的強(qiáng)烈區(qū)室化雨膨。還觀察到單核細(xì)胞的差異定位。