這一篇接上一篇坝茎,我們來(lái)分享Cellrank的基礎(chǔ)分析代碼惧眠,我們直接開(kāi)始
使用 RNA 速度和轉(zhuǎn)錄組學(xué)相似性來(lái)估計(jì)細(xì)胞-細(xì)胞轉(zhuǎn)換概率详囤。即使沒(méi)有 RNA 速度信息桨武,也可以應(yīng)用 CellRank(但還是推薦大家RNA 速率和相似度方法聯(lián)合使用)肋拔。
加載
import sys
if "google.colab" in sys.modules:
!pip install -q git+https://github.com/theislab/cellrank@dev
!pip install python-igraph
import scvelo as scv
import scanpy as sc
import cellrank as cr
import numpy as np
scv.settings.verbosity = 3
scv.settings.set_figure_params("scvelo")
cr.settings.verbosity = 2
import warnings
warnings.simplefilter("ignore", category=UserWarning)
warnings.simplefilter("ignore", category=FutureWarning)
warnings.simplefilter("ignore", category=DeprecationWarning)
首先,需要獲取數(shù)據(jù)呀酸。 以下命令將下載 adata 對(duì)象并將其保存在 datasets/endocrinogenesis_day15.5.h5ad
下(示例數(shù)據(jù)凉蜂,大家可以下載下來(lái)自己研究)。 我們還將顯示拼接/未拼接讀數(shù)的比例性誉,需要用它來(lái)估計(jì) RNA 速度窿吩。
adata = cr.datasets.pancreas()
scv.pl.proportions(adata)
adata
AnnData object with n_obs × n_vars = 2531 × 27998
obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'palantir_pseudotime'
var: 'highly_variable_genes'
uns: 'clusters_colors', 'clusters_fine_colors', 'day_colors', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
obsp: 'connectivities', 'distances'
前處理數(shù)據(jù)
過(guò)濾掉沒(méi)有足夠剪接/未剪接計(jì)數(shù)的基因,對(duì)數(shù)據(jù)進(jìn)行歸一化和對(duì)數(shù)變換错览,并限制在高度可變的基因上纫雁。 此外,計(jì)算速度估計(jì)的主成分和矩蝗砾。
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
sc.tl.pca(adata)
sc.pp.neighbors(adata, n_pcs=30, n_neighbors=30)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)
Run scVelo
我們將使用來(lái)自 scVelo 的動(dòng)力學(xué)模型來(lái)估計(jì)速度先较。
scv.tl.recover_dynamics(adata, n_jobs=8)
一旦有了參數(shù)携冤,就可以使用這些參數(shù)來(lái)計(jì)算速度和速度圖悼粮。 速度圖是一個(gè)加權(quán)圖,它指定了兩個(gè)cell在給定速度向量和相對(duì)位置的情況下轉(zhuǎn)換為另一個(gè)cell的可能性曾棕。
scv.tl.velocity(adata, mode="dynamical")
scv.tl.velocity_graph(adata)
scv.pl.velocity_embedding_stream(
adata, basis="umap", legend_fontsize=12, title="", smooth=0.8, min_mass=4
)
運(yùn)行Cellrank
CellRank 提供了多種將方向性注入單細(xì)胞數(shù)據(jù)的方法扣猫。 在這里,方向信息來(lái)自 RNA 速度翘地,使用這些信息來(lái)計(jì)算胰腺發(fā)育動(dòng)態(tài)過(guò)程的初始和終止?fàn)顟B(tài)以及fate probabilities申尤。
Identify terminal states
cr.tl.terminal_states(adata, cluster_key="clusters", weight_connectivities=0.2)
The most important parameters in the above function are:
estimator
: this determines what’s going to behind the scenes to compute the terminal states. Options arecr.tl.estimators.CFLARE
(“Clustering and Filtering of Left and Right Eigenvectors”) orcr.tl.estimators.GPCCA
(“Generalized Perron Cluster Cluster Analysis, [Reuter et al., 2018] and [Reuter et al., 2019], see also our pyGPCCA implementation). The latter is the default, it computes terminal states by coarse graining the velocity-derived Markov chain into a set of macrostates that represent the slow-time scale dynamics of the process, i.e. it finds the states that you are unlikely to leave again, once you have entered them.cluster_key
: takes a key fromadata.obs
to retrieve pre-computed cluster labels, i.e. ‘clusters’ or ‘louvain’. These labels are then mapped onto the set of terminal states, to associate a name and a color with each state.n_states
: number of expected terminal states. This parameter is optional - if it’s not provided, this number is estimated from the so-called ‘eigengap heuristic’ of the spectrum of the transition matrix.method
: This is only relevant for the estimatorGPCCA
. It determines the way in which we compute and sort the real Schur decomposition. The default,krylov
, is an iterative procedure that works with sparse matrices which allows the method to scale to very large cell numbers. It relies on the libraries SLEPc and PETSc, which you will have to install separately, see our installation instructions. If your dataset is small (<5k cells), and you don’t want to install these at the moment, usemethod='brandts'
[Brandts, 2002]. The results will be the same, the difference is thatbrandts
works with dense matrices and won’t scale to very large cells numbers.weight_connectivities
: weight given to cell-cell similarities to account for noise in velocity vectors.
cr.pl.terminal_states(adata)
Identify initial states
cr.tl.initial_states(adata, cluster_key="clusters")
cr.pl.initial_states(adata, discrete=True)
Compute fate maps
一旦知道終端狀態(tài)癌幕,就可以計(jì)算相關(guān)的命運(yùn)圖——對(duì)于每個(gè)細(xì)胞,尋求細(xì)胞朝著每個(gè)確定的終端狀態(tài)發(fā)展的可能性有多大昧穿。
cr.tl.lineages(adata)
cr.pl.lineages(adata, same_plot=False)
可以將上述內(nèi)容聚合成一個(gè)單一的全局命運(yùn)圖勺远,其中將每個(gè)終端狀態(tài)與顏色相關(guān)聯(lián),并使用該顏色的強(qiáng)度來(lái)顯示每個(gè)單個(gè)細(xì)胞的命運(yùn):
cr.pl.lineages(adata, same_plot=True)
Directed PAGA
我們可以使用具有有向邊的 [Wolf et al., 2019] 的改編版本將個(gè)體命運(yùn)圖進(jìn)一步聚合成集群級(jí)別的命運(yùn)圖时鸵。 我們首先使用 CellRank 識(shí)別的 root_key 和 end_key 計(jì)算 scVelo 的潛伏時(shí)間胶逢,它們分別是初始狀態(tài)或終止?fàn)顟B(tài)的概率。
scv.tl.recover_latent_time(
adata, root_key="initial_states_probs", end_key="terminal_states_probs"
)
Next, we can use the inferred pseudotime along with the initial and terminal states probabilities to compute the directed PAGA.
scv.tl.paga(
adata,
groups="clusters",
root_key="initial_states_probs",
end_key="terminal_states_probs",
use_time_prior="velocity_pseudotime",
)
cr.pl.cluster_fates(
adata,
mode="paga_pie",
cluster_key="clusters",
basis="umap",
legend_kwargs={"loc": "top right out"},
legend_loc="top left out",
node_size_scale=5,
edge_width_scale=1,
max_edge_width=4,
title="directed PAGA",
)
Compute lineage drivers
可以計(jì)算所有譜系或部分譜系的驅(qū)動(dòng)基因饰潜。 還可以通過(guò)指定cluster=...將其限制為某些cluster初坠。 在生成的數(shù)據(jù)框中,還看到了 p 值彭雾、校正后的 p 值(q 值)和相關(guān)統(tǒng)計(jì)量的 95% 置信區(qū)間碟刺。
Afterwards, we can plot the top 5 driver genes (based on the correlation), e.g. for the Alpha lineage
cr.pl.lineage_drivers(adata, lineage="Alpha", n_genes=5)
Gene expression trends
上面演示的功能是 CellRank 的主要功能:計(jì)算初始和終止?fàn)顟B(tài)以及概率命運(yùn)圖。 現(xiàn)在可以使用計(jì)算出的概率來(lái)例如 沿譜系平滑基因表達(dá)趨勢(shì)薯酝。
從細(xì)胞的時(shí)間順序開(kāi)始半沽。 為了得到這個(gè),可以計(jì)算 scVelo 的潛伏時(shí)間吴菠,如前所述抄囚,或者,我們可以只使用 CellRank 的初始狀態(tài)來(lái)計(jì)算 橄务。
# compue DPT, starting from CellRank defined root cell
root_idx = np.where(adata.obs["initial_states"] == "Ngn3 low EP")[0][0]
adata.uns["iroot"] = root_idx
sc.tl.dpt(adata)
scv.pl.scatter(
adata,
color=["clusters", root_idx, "latent_time", "dpt_pseudotime"],
fontsize=16,
cmap="viridis",
perc=[2, 98],
colorbar=True,
rescale_color=[0, 1],
title=["clusters", "root cell", "latent time", "dpt pseudotime"],
)
We can plot dynamics of genes in pseudotime along individual trajectories, defined via the fate maps we computed above.
model = cr.ul.models.GAM(adata)
cr.pl.gene_trends(
adata,
model=model,
data_key="X",
genes=["Pak3", "Neurog3", "Ghrl"],
ncols=3,
time_key="latent_time",
same_plot=True,
hide_cells=True,
figsize=(15, 4),
n_test_points=200,
)
還可以在熱圖中可視化上面計(jì)算的譜系驅(qū)動(dòng)程序幔托。 下面,對(duì) Alpha 譜系執(zhí)行此操作蜂挪,即在偽時(shí)間中平滑假定的 Alpha 驅(qū)動(dòng)程序的基因表達(dá)重挑,將 Alpha 命運(yùn)概率用作細(xì)胞級(jí)別的權(quán)重。 根據(jù)它們?cè)趥螘r(shí)間中的峰值對(duì)基因進(jìn)行排序棠涮,從而揭示了基因表達(dá)事件的級(jí)聯(lián)谬哀。
cr.pl.heatmap(
adata,
model,
genes=adata.varm['terminal_lineage_drivers']["Alpha_corr"].sort_values(ascending=False).index[:100],
show_absorption_probabilities=True,
lineages="Alpha",
n_jobs=1,
backend="loky",
)
生活很好,有你更好