說(shuō)起軌跡推斷想许,很多人的第一印象就是monocle的軌跡圖,大概率是長(zhǎng)這樣子的:
如果說(shuō)單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析中的分群是尋找細(xì)胞的離散屬性断序,那么軌跡推斷就是尋找細(xì)胞分化連續(xù)性的嘗試流纹。為什么細(xì)胞的分化既有離散性又有連續(xù)性呢逢倍?這是一個(gè)歷史問(wèn)題,細(xì)胞的分化當(dāng)然是連續(xù)的较雕,之所以用分群的方法來(lái)解釋異質(zhì)性碉哑,實(shí)在是一種無(wú)奈之舉。每一個(gè)細(xì)胞都是獨(dú)一無(wú)二的扣典,沒(méi)有一個(gè)細(xì)胞是孤島,這是我們的口號(hào)慎玖,但是理想與現(xiàn)實(shí)總是不能統(tǒng)一贮尖。
monocle提供了一套具有啟發(fā)意義的軌跡方法趁怔,一簡(jiǎn)單粗暴的方式試圖彌補(bǔ)這理想與現(xiàn)實(shí)的大峽谷薪前。在monocle的世界里軌跡與圖譜是分離的,即圖譜是tsne/umap的关斜,軌跡是另一個(gè)降維空間示括。那么有沒(méi)有一種降維技術(shù)能夠再走一步呢痢畜?今天我們介紹的scanpy的PAGA(graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells)就是這方面的一個(gè)嘗試:在保留細(xì)胞圖譜的基礎(chǔ)上完成細(xì)胞軌跡的推斷:
它是如何現(xiàn)實(shí)的呢?其實(shí)是統(tǒng)一了聚類和軌跡推斷的空間結(jié)構(gòu)丁稀。
基于分區(qū)的圖抽象(Partition-based graph abstraction )生成單個(gè)細(xì)胞的拓?fù)浣Y(jié)構(gòu)并保留映射吼拥。高維基因表達(dá)數(shù)據(jù)降維后計(jì)算鄰域關(guān)系的相關(guān)距離度量來(lái)表示kNN圖(pca和歐氏距離)。kNN圖按所需的分辨率進(jìn)行分群凿可,其中分群表示連續(xù)細(xì)胞群。為此授账,我們通常使用Louvain算法矿酵,然而矗积,分群也可以通過(guò)其他任何方式獲得敞咧。PAGA圖是通過(guò)將一個(gè)節(jié)點(diǎn)與每個(gè)亞群關(guān)聯(lián)起來(lái)棘捣,并通過(guò)表示亞群之間連接性的統(tǒng)計(jì)度量的加權(quán)邊連接每個(gè)節(jié)點(diǎn)得到休建。通過(guò)丟棄低權(quán)重的假邊乍恐,PAGA圖揭示了數(shù)據(jù)在選定分辨率下的去噪拓?fù)浣Y(jié)構(gòu)测砂,并揭示了其連接和斷開(kāi)的區(qū)域。
好了砌些,我們來(lái)看看scanpy中PAGA是如何實(shí)現(xiàn)的吧呜投,好不好?
首先我們導(dǎo)入我們的數(shù)據(jù):
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()
sc.settings.set_figure_params(dpi=80)
scanpy==1.4.5.1
anndata==0.7.1
umap==0.3.10
numpy==1.16.5
scipy==1.3.1
pandas==0.25.1
scikit-learn==0.21.3
statsmodels==0.10.1
python-igraph==0.8.0
results_file = 'E:\learnscanpy\write\pbmc3k.h5ad'
help(sc.read_10x_mtx)
adata = sc.read_10x_mtx(
'E:/learnscanpy/data/filtered_feature_bc_matrix', # the directory with the `.mtx` file
var_names='gene_symbols', # use gene symbols for the variable names (variables-axis index)
cache=True)
adata.var_names_make_unique() # this is unnecessary if using `var_names='gene_ids'` in `sc.read_10x_mtx`
adata
Out[282]:
AnnData object with n_obs × n_vars = 5025 × 33694
var: 'gene_ids', 'feature_types'
這里我們使用zheng17的數(shù)據(jù)預(yù)處理方法仑荐,僅僅是因?yàn)楹?jiǎn)單,您也可以像單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析|| scanpy教程:預(yù)處理與聚類一樣自己一步一步地執(zhí)行預(yù)處理纵东。
sc.pp.recipe_zheng17(adata)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=4, n_pcs=20)
sc.tl.leiden(adata)
也許粘招,你會(huì)問(wèn)這到底是一種怎樣的cell QC的過(guò)程偎球,不難洒扎,來(lái)看看它的幫助文檔啊,都i給你寫(xiě)好了呢袍冷?磷醋!
help(sc.pp.recipe_zheng17)
Help on function recipe_zheng17 in module scanpy.preprocessing._recipes:
recipe_zheng17(adata: anndata._core.anndata.AnnData, n_top_genes: int = 1000, log: bool = True, plot: bool = False, copy: bool = False) -> Union[anndata._core.anndata.AnnData, NoneType]
Normalization and filtering as of [Zheng17]_.
Reproduces the preprocessing of [Zheng17]_ – the Cell Ranger R Kit of 10x
Genomics.
Expects non-logarithmized data.
If using logarithmized data, pass `log=False`.
The recipe runs the following steps
.. code:: python
sc.pp.filter_genes(adata, min_counts=1) # only consider genes with more than 1 count
sc.pp.normalize_per_cell( # normalize with total UMI count per cell
adata, key_n_counts='n_counts_all'
)
filter_result = sc.pp.filter_genes_dispersion( # select highly-variable genes
adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False
)
adata = adata[:, filter_result.gene_subset] # subset the genes
sc.pp.normalize_per_cell(adata) # renormalize after filtering
if log: sc.pp.log1p(adata) # log transform: adata.X = log(adata.X + 1)
sc.pp.scale(adata) # scale to unit variance and shift to zero mean
Parameters
----------
adata
Annotated data matrix.
n_top_genes
Number of genes to keep.
log
Take logarithm.
plot
Show a plot of the gene dispersion vs. mean relation.
copy
Return a copy of `adata` instead of updating it.
Returns
-------
Returns or updates `adata` depending on `copy`.
加個(gè)plot參數(shù)試試难裆,好使!
sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata, color='leiden', legend_loc='on data')
這里你不要問(wèn)下沒(méi)有執(zhí)行降維乃戈,這個(gè)細(xì)胞圖譜是在哪個(gè)空間里的呢褂痰?這就要問(wèn)一下sc.tl.draw_graph了:
adata
Out[292]:
AnnData object with n_obs × n_vars = 5025 × 1000
obs: 'n_counts_all', 'leiden'
var: 'gene_ids', 'feature_types', 'n_counts'
uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors'
obsm: 'X_pca', 'X_draw_graph_fr'
varm: 'PCs'
help(sc.tl.draw_graph)
Help on function draw_graph in module scanpy.tools._draw_graph:
draw_graph(adata: anndata._core.anndata.AnnData, layout: scanpy._compat.Literal_ = 'fa', init_pos: Union[str, bool, NoneType] = None, root: Union[int, NoneType] = None, random_state: Union[int, mtrand.RandomState, NoneType] = 0, n_jobs: Union[int, NoneType] = None, adjacency: Union[scipy.sparse.base.spmatrix, NoneType] = None, key_added_ext: Union[str, NoneType] = None, copy: bool = False, **kwds)
Force-directed graph drawing [Islam11]_ [Jacomy14]_ [Chippada18]_.
An alternative to tSNE that often preserves the topology of the data
better. This requires to run :func:`~scanpy.pp.neighbors`, first.
The default layout ('fa', `ForceAtlas2`) [Jacomy14]_ uses the package |fa2|_
[Chippada18]_, which can be installed via `pip install fa2`.
`Force-directed graph drawing`_ describes a class of long-established
algorithms for visualizing graphs.
It has been suggested for visualizing single-cell data by [Islam11]_.
Many other layouts as implemented in igraph [Csardi06]_ are available.
Similar approaches have been used by [Zunder15]_ or [Weinreb17]_.
.. |fa2| replace:: `fa2`
.. _fa2: https://github.com/bhargavchippada/forceatlas2
.. _Force-directed graph drawing: https://en.wikipedia.org/wiki/Force-directed_graph_drawing
Parameters
----------
adata
Annotated data matrix.
layout
'fa' (`ForceAtlas2`) or any valid `igraph layout
<http://igraph.org/c/doc/igraph-Layout.html>`__. Of particular interest
are 'fr' (Fruchterman Reingold), 'grid_fr' (Grid Fruchterman Reingold,
faster than 'fr'), 'kk' (Kamadi Kawai', slower than 'fr'), 'lgl' (Large
Graph, very fast), 'drl' (Distributed Recursive Layout, pretty fast) and
'rt' (Reingold Tilford tree layout).
root
Root for tree layouts.
random_state
For layouts with random initialization like 'fr', change this to use
different intial states for the optimization. If `None`, no seed is set.
adjacency
Sparse adjacency matrix of the graph, defaults to
`adata.uns['neighbors']['connectivities']`.
key_added_ext
By default, append `layout`.
proceed
Continue computation, starting off with 'X_draw_graph_`layout`'.
init_pos
`'paga'`/`True`, `None`/`False`, or any valid 2d-`.obsm` key.
Use precomputed coordinates for initialization.
If `False`/`None` (the default), initialize randomly.
copy
Return a copy instead of writing to adata.
**kwds
Parameters of chosen igraph layout. See e.g. `fruchterman-reingold`_
[Fruchterman91]_. One of the most important ones is `maxiter`.
.. _fruchterman-reingold: http://igraph.org/python/doc/igraph.Graph-class.html#layout_fruchterman_reingold
Returns
-------
Depending on `copy`, returns or updates `adata` with the following field.
**X_draw_graph_layout** : `adata.obsm`
Coordinates of graph layout. E.g. for layout='fa' (the default),
the field is called 'X_draw_graph_fa'
這是tsne的一種代替方案症虑,由fa2庫(kù)完成,繪圖用的是Force-directed_graph_drawing谍憔,說(shuō)明文檔中給出了相關(guān)的鏈接匪蝙。
那么我們看看umap的結(jié)構(gòu)是怎么樣的:
sc.tl.umap(adata)
sc.pl.umap(adata, color=['leiden'])
umap都做了,再看看tsne的結(jié)果吧:
sc.tl.tsne(adata)
sc.pl.tsne(adata, color=['leiden'])
果然tsne要慢很多习贫,等了好一會(huì)。苫昌。颤绕。
可選操作:圖形去噪
為了去噪祟身,用擴(kuò)散映射空間來(lái)表示它(而不是PCA空間)奥务。計(jì)算幾個(gè)擴(kuò)散分量?jī)?nèi)的距離相當(dāng)于圖像去噪——我們只取幾個(gè)第一個(gè)光譜分量袜硫。這與使用PCA去噪數(shù)據(jù)矩陣非常相似氯葬。注意婉陷,對(duì)于PAGA帚称、聚類或偽時(shí)間估計(jì)秽澳,這都不是必要的步驟。
sc.tl.diffmap(adata)
computing Diffusion Maps using n_comps=15(=n_dcs)
computing transitions
finished (0:00:00)
eigenvalues of transition matrix
[1. 1. 1. 0.9997476 0.9994019 0.9990253
0.9986317 0.99379796 0.9936346 0.9925993 0.99070626 0.9898019
0.98812026 0.9873447 0.9869766 ]
finished: added
'X_diffmap', diffmap coordinates (adata.obsm)
'diffmap_evals', eigenvalues of transition matrix (adata.uns) (0:00:00)
sc.pp.neighbors(adata, n_neighbors=10, use_rep='X_diffmap')
sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata, color='leiden', legend_loc='on data')
雖然還是有些凌亂肝集,但還是清爽了一些的:
Clustering and PAGA
其實(shí)我們已經(jīng)聚類過(guò)了啊瞻坝,不然哪來(lái)的標(biāo)簽?zāi)亍?/p>
# sc.tl.louvain(adata, resolution=1.0)
# sc.tl.leiden(adata)
sc.tl.paga(adata, groups='leiden')
running PAGA
finished: added
'paga/connectivities', connectivities adjacency (adata.uns)
'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)
adata.var_names
Out[304]:
Index(['AP006222.2', 'MEGF6', 'RP11-46F15.2', 'GPR153', 'RP5-1113E3.3',
'RP4-734G22.3', 'MASP2', 'RP3-467K16.2', 'RP4-680D5.2', 'AKR7A3',
...
'MT-CO1', 'MT-CO2', 'MT-ATP6', 'MT-CO3', 'MT-ND3', 'MT-ND4', 'MT-ND5',
'MT-CYB', 'AC145212.2', 'AC011043.1'],
dtype='object', length=1000)
adata
Out[308]:
AnnData object with n_obs × n_vars = 5025 × 1000
obs: 'n_counts_all', 'leiden'
var: 'gene_ids', 'feature_types', 'n_counts'
uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors', 'umap', 'diffmap_evals', 'paga', 'leiden_sizes'
obsm: 'X_pca', 'X_draw_graph_fr', 'X_umap', 'X_tsne', 'X_diffmap'
varm: 'PCs'
sc.pl.paga(adata, color=['leiden', 'MEGF6', 'MT-CO2', 'AKR7A3'])
--> added 'pos', the PAGA positions (adata.uns['paga'])
我們看到一個(gè)框架圖,其實(shí)是一個(gè)無(wú)向的網(wǎng)絡(luò)圖所刀,點(diǎn)的顏色代表不同分群衙荐,點(diǎn)的大小代表該群細(xì)胞數(shù)的大小浮创,連線只有一種顏色忧吟,粗細(xì)代表兩個(gè)群之間連接更密切斩披。這里并沒(méi)有一個(gè)擬時(shí)的概念溜族,軌跡是一種相互關(guān)系垦沉,可以是時(shí)間的,也可以不是厕倍,而往往以人類的直覺(jué)寡壮,時(shí)間都是單向的,但是每個(gè)細(xì)胞都有自己的分化方向讹弯。一個(gè)簡(jiǎn)單點(diǎn)的框架圖,讓我們從新思考細(xì)胞分化這一基本事實(shí)组民! 但是下面我們會(huì)在這個(gè)基礎(chǔ)上推斷出一個(gè)擬時(shí)結(jié)構(gòu)棒仍。
adata.obs['leiden'].cat.categories
Out[306]:
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
'13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',
'25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35'],
dtype='object')
遇到核心函數(shù),就算沒(méi)有什么疑問(wèn)也是要看說(shuō)明文檔的敖岛荨:
help(sc.tl.paga)
Help on function paga in module scanpy.tools._paga:
paga(adata: anndata._core.anndata.AnnData, groups: Union[str, NoneType] = None, use_rna_velocity: bool = False, model: scanpy._compat.Literal_ = 'v1.2', copy: bool = False)
Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19]_.
By quantifying the connectivity of partitions (groups, clusters) of the
single-cell graph, partition-based graph abstraction (PAGA) generates a much
simpler abstracted graph (*PAGA graph*) of partitions, in which edge weights
represent confidence in the presence of connections. By tresholding this
confidence in :func:`~scanpy.pl.paga`, a much simpler representation of the
manifold data is obtained, which is nonetheless faithful to the topology of
the manifold.
The confidence should be interpreted as the ratio of the actual versus the
expected value of connetions under the null model of randomly connecting
partitions. We do not provide a p-value as this null model does not
precisely capture what one would consider "connected" in real data, hence it
strongly overestimates the expected value. See an extensive discussion of
this in [Wolf19]_.
.. note::
Note that you can use the result of :func:`~scanpy.pl.paga` in
:func:`~scanpy.tl.umap` and :func:`~scanpy.tl.draw_graph` via
`init_pos='paga'` to get single-cell embeddings that are typically more
faithful to the global topology.
Parameters
----------
adata
An annotated data matrix.
groups
Key for categorical in `adata.obs`. You can pass your predefined groups
by choosing any categorical annotation of observations. Default:
The first present key of `'leiden'` or `'louvain'`.
use_rna_velocity
Use RNA velocity to orient edges in the abstracted graph and estimate
transitions. Requires that `adata.uns` contains a directed single-cell
graph with key `['velocity_graph']`. This feature might be subject
to change in the future.
model
The PAGA connectivity model.
copy
Copy `adata` before computation and return a copy. Otherwise, perform
computation inplace and return `None`.
Returns
-------
**connectivities** : :class:`numpy.ndarray` (adata.uns['connectivities'])
The full adjacency matrix of the abstracted graph, weights correspond to
confidence in the connectivities of partitions.
**connectivities_tree** : :class:`scipy.sparse.csr_matrix` (adata.uns['connectivities_tree'])
The adjacency matrix of the tree-like subgraph that best explains
the topology.
Notes
-----
Together with a random walk-based distance measure
(e.g. :func:`scanpy.tl.dpt`) this generates a partial coordinatization of
data useful for exploring and explaining its variation.
.. currentmodule:: scanpy
See Also
--------
pl.paga
pl.paga_path
pl.paga_compare
我們看到use_rna_velocity參數(shù),scanpy也可以用rna速率數(shù)據(jù)啊庇楞。
假裝我們已經(jīng)做好了細(xì)胞定義:
adata.obs['leiden_anno'] = adata.obs['leiden']
adata.obs['leiden_anno'].cat.categories = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10/Ery', '11', '12',
'13', '14', '15', '16/Stem', '17', '18', '19/Neu', '20/Mk', '21', '22/Baso', '23', '24/Mo','25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35']
sc.tl.paga(adata, groups='leiden_anno')
sc.pl.paga(adata, threshold=0.03, show=False)
用PAGA的結(jié)果重新計(jì)算嵌入否纬,也就是使得細(xì)胞的結(jié)構(gòu)與PAGA的結(jié)構(gòu)一致:
sc.tl.draw_graph(adata, init_pos='paga')
sc.pl.draw_graph(adata, color=['leiden_anno','MEGF6', 'MT-CO2', 'AKR7A3'], legend_loc='on data')
drawing single-cell graph using layout 'fa'
WARNING: Package 'fa2' is not installed, falling back to layout 'fr'.To use the faster and better ForceAtlas2 layout, install package 'fa2' (`pip install fa2`).
finished: added
'X_draw_graph_fr', graph_drawing coordinates (adata.obsm) (0:00:18)
當(dāng)然這個(gè)圖并不是我們期望,看上去依然略顯混亂临燃,根據(jù)waring的提示也許我應(yīng)該pip install fa2
,于是我們就pip install fa2
,再來(lái)看一下結(jié)果:
forceatlas2將Gephi的Force Atlas 2布局算法移植到python2和python3(帶有NetworkX和igraph的包裝)睛驳。這是最快的python實(shí)現(xiàn),大部分功能已經(jīng)完成膜廊。它還支持Barnes Hut的最大加速近似值。ForceAtlas2是一種非匙希快速的面向力的圖形布局算法蹬跃。它用于在2D中對(duì)一個(gè)加權(quán)的無(wú)向圖進(jìn)行空間化(邊緣權(quán)值定義了連接的強(qiáng)度)。實(shí)現(xiàn)基于本文和相應(yīng)的gephi-java代碼蝶缀。與networkx的fruchterman reingold算法(spring layout)相比丹喻,它確實(shí)非常快翁都,并且可以很好地?cái)U(kuò)展到大量節(jié)點(diǎn)(>10000)碍论。
回到我們的help(sc.tl.draw_graph)柄慰,看到這layout的選項(xiàng)支持不同的igraph layout主題:
layout
'fa' (`ForceAtlas2`) or any valid `igraph layout
<http://igraph.org/c/doc/igraph-Layout.html>`__. Of particular interest
are 'fr' (Fruchterman Reingold), 'grid_fr' (Grid Fruchterman Reingold,
faster than 'fr'), 'kk' (Kamadi Kawai', slower than 'fr'), 'lgl' (Large
Graph, very fast), 'drl' (Distributed Recursive Layout, pretty fast) and
'rt' (Reingold Tilford tree layout).
sc.tl.draw_graph(adata, init_pos='paga',layout= 'lgl')
sc.pl.draw_graph(adata, color=['leiden_anno','MEGF6', 'MT-CO2', 'AKR7A3'], legend_loc='on data')
sc.tl.draw_graph(adata, init_pos='paga',layout= 'kk')
sc.pl.draw_graph(adata, color=['leiden_anno','MEGF6', 'MT-CO2', 'AKR7A3'], legend_loc='on data')
我們還是用fa吧。我們不禁會(huì)想坐搔,monocle的layout是什么呢藏研,為什么看起來(lái)那么直白?
pl.figure(figsize=(8, 2))
for i in range(28):
pl.scatter(i, 1, c=sc.pl.palettes.zeileis_28[i], s=200)
pl.show()
zeileis_colors = np.array(sc.pl.palettes.zeileis_28)
new_colors = np.array(adata.uns['leiden_anno_colors'])
new_colors[[16]] = zeileis_colors[[12]] # Stem colors / green
new_colors[[10, 17, 5, 3, 15, 6, 18, 13, 7, 12]] = zeileis_colors[[5, 5, 5, 5, 11, 11, 10, 9, 21, 21]] # Ery colors / red
new_colors[[20, 8]] = zeileis_colors[[17, 16]] # Mk early Ery colors / yellow
new_colors[[4, 0]] = zeileis_colors[[2, 8]] # lymph progenitors / grey
new_colors[[22]] = zeileis_colors[[18]] # Baso / turquoise
new_colors[[19, 14, 2]] = zeileis_colors[[6, 6, 6]] # Neu / light blue
new_colors[[24, 9, 1, 11]] = zeileis_colors[[0, 0, 0, 0]] # Mo / dark blue
new_colors[[21, 23]] = zeileis_colors[[25, 25]] # outliers / grey
adata.uns['leiden_anno_colors'] = new_colors
sc.pl.paga_compare(
adata, threshold=0.03, title='', right_margin=0.2, size=10, edge_width_scale=0.5,
legend_fontsize=12, fontsize=12, frameon=False, edges=True, save=True)
接下來(lái)我們?cè)谶@個(gè)細(xì)胞圖譜上繪制擬時(shí)信息遥倦,雖說(shuō)是擬時(shí)推斷,卻需要指定一個(gè)亞群作為起點(diǎn)占锯。但是真正的樣本中有時(shí)并不存在一個(gè)明確的起點(diǎn)袒哥,有時(shí)是同時(shí)發(fā)生發(fā)育的消略。盡管每個(gè)生命從長(zhǎng)遠(yuǎn)來(lái)看所有的細(xì)胞都來(lái)自一個(gè)細(xì)胞堡称。這個(gè)起點(diǎn)如何確定艺演,就像牛頓的宇宙第一推動(dòng)者的問(wèn)題一樣却紧,必須跳出數(shù)據(jù)本身來(lái)思考它胎撤。這里我們?yōu)榱送瓿扇蝿?wù)還是選擇一個(gè)發(fā)育的起點(diǎn)吧晓殊。
adata.uns['iroot'] = np.flatnonzero(adata.obs['leiden_anno'] == '16/Stem')[0]
sc.tl.dpt(adata)
adata
Out[332]:
AnnData object with n_obs × n_vars = 5025 × 1000
obs: 'n_counts_all', 'leiden', 'leiden_anno', 'dpt_pseudotime'
var: 'gene_ids', 'feature_types', 'n_counts'
uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors', 'umap', 'diffmap_evals', 'paga', 'leiden_sizes', 'leiden_anno_sizes', 'leiden_anno_colors', 'iroot'
obsm: 'X_pca', 'X_draw_graph_fa', 'X_umap', 'X_tsne', 'X_diffmap', 'X_draw_graph_lgl', 'X_draw_graph_kk'
varm: 'PCs'
adata.obs['dpt_pseudotime']
Out[333]:
AAACCCAAGCGTATGG-1 0.661545
AAACCCAGTCCTACAA-1 0.670825
AAACCCATCACCTCAC-1 0.000000
AAACGCTAGGGCATGT-1 0.839952
AAACGCTGTAGGTACG-1 0.816287
TTTGTTGCAGGTACGA-1 0.826858
TTTGTTGCAGTCTCTC-1 0.792751
TTTGTTGGTAATTAGG-1 0.329366
TTTGTTGTCCTTGGAA-1 0.905214
TTTGTTGTCGCACGAC-1 0.771131
Name: dpt_pseudotime, Length: 5025, dtype: float32
我們不禁思考伤提,既然每個(gè)生命從長(zhǎng)遠(yuǎn)來(lái)看所有的細(xì)胞都來(lái)自一個(gè)細(xì)胞巫俺,是不是在一套數(shù)據(jù)集中可以設(shè)置一個(gè)遙遠(yuǎn)的點(diǎn)(實(shí)際本不存在的)作為發(fā)育的起點(diǎn)呢肿男?這樣是不是更能反映軌跡推斷的實(shí)際呢介汹?據(jù)我所知好像還沒(méi)有這樣的算法出現(xiàn)舶沛。
sc.tl.draw_graph(adata, init_pos='paga')
sc.pl.draw_graph(adata, color=['leiden_anno', 'dpt_pseudotime'], legend_loc='on data')
可以自定義發(fā)育路徑
paths = [('erythrocytes', [16, 12, 7, 13, 18, 6, 5, 10]),
('neutrophils', [16, 0, 4, 2, 14, 19]),
('monocytes', [16, 0, 4, 11, 1, 9, 24])]
adata.obs['distance'] = adata.obs['dpt_pseudotime']
adata.obs['clusters'] = adata.obs['leiden_anno']
adata.uns['clusters_colors'] = adata.uns['leiden_anno_colors']
gene_names = adata.var_names[1:10]
adata
Out[338]:
AnnData object with n_obs × n_vars = 5025 × 1000
obs: 'n_counts_all', 'leiden', 'leiden_anno', 'dpt_pseudotime', 'distance', 'clusters'
var: 'gene_ids', 'feature_types', 'n_counts'
uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors', 'umap', 'diffmap_evals', 'paga', 'leiden_sizes', 'leiden_anno_sizes', 'leiden_anno_colors', 'iroot', 'clusters_colors'
obsm: 'X_pca', 'X_draw_graph_fa', 'X_umap', 'X_tsne', 'X_diffmap', 'X_draw_graph_lgl', 'X_draw_graph_kk'
varm: 'PCs'
_, axs = pl.subplots(ncols=3, figsize=(6, 2.5), gridspec_kw={'wspace': 0.05, 'left': 0.12})
pl.subplots_adjust(left=0.05, right=0.98, top=0.82, bottom=0.2)
for ipath, (descr, path) in enumerate(paths):
_, data = sc.pl.paga_path(
adata, path, gene_names,
show_node_names=False,
ax=axs[ipath],
ytick_fontsize=12,
left_margin=0.15,
n_avg=50,
annotations=['distance'],
show_yticks=True if ipath==0 else False,
show_colorbar=False,
color_map='Greys',
groups_key='clusters',
color_maps_annotations={'distance': 'viridis'},
title='{} path'.format(descr),
return_data=True,
show=False)
data.to_csv('./write/paga_path_{}.csv'.format(descr))
pl.savefig('./figures/paga_path_paul15.pdf')
pl.show()
仔細(xì)看看這個(gè)圖有沒(méi)有想起來(lái)monocle的,基因在不同命運(yùn)中的變化情況: