單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析|| scanpy教程：PAGA軌跡推斷

單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析|| scanpy教程：預(yù)處理與聚類

說(shuō)起軌跡推斷想许，很多人的第一印象就是monocle的軌跡圖，大概率是長(zhǎng)這樣子的：

如果說(shuō)單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析中的分群是尋找細(xì)胞的離散屬性断序，那么軌跡推斷就是尋找細(xì)胞分化連續(xù)性的嘗試流纹。為什么細(xì)胞的分化既有離散性又有連續(xù)性呢逢倍？這是一個(gè)歷史問(wèn)題，細(xì)胞的分化當(dāng)然是連續(xù)的较雕，之所以用分群的方法來(lái)解釋異質(zhì)性碉哑，實(shí)在是一種無(wú)奈之舉。每一個(gè)細(xì)胞都是獨(dú)一無(wú)二的扣典，沒(méi)有一個(gè)細(xì)胞是孤島，這是我們的口號(hào)慎玖，但是理想與現(xiàn)實(shí)總是不能統(tǒng)一贮尖。

monocle提供了一套具有啟發(fā)意義的軌跡方法趁怔，一簡(jiǎn)單粗暴的方式試圖彌補(bǔ)這理想與現(xiàn)實(shí)的大峽谷薪前。在monocle的世界里軌跡與圖譜是分離的，即圖譜是tsne/umap的关斜，軌跡是另一個(gè)降維空間示括。那么有沒(méi)有一種降維技術(shù)能夠再走一步呢痢畜？今天我們介紹的scanpy的PAGA(graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells)就是這方面的一個(gè)嘗試:在保留細(xì)胞圖譜的基礎(chǔ)上完成細(xì)胞軌跡的推斷：

它是如何現(xiàn)實(shí)的呢？其實(shí)是統(tǒng)一了聚類和軌跡推斷的空間結(jié)構(gòu)丁稀。

基于分區(qū)的圖抽象(Partition-based graph abstraction )生成單個(gè)細(xì)胞的拓?fù)浣Y(jié)構(gòu)并保留映射吼拥。高維基因表達(dá)數(shù)據(jù)降維后計(jì)算鄰域關(guān)系的相關(guān)距離度量來(lái)表示kNN圖（pca和歐氏距離）。kNN圖按所需的分辨率進(jìn)行分群凿可，其中分群表示連續(xù)細(xì)胞群。為此授账，我們通常使用Louvain算法矿酵，然而矗积，分群也可以通過(guò)其他任何方式獲得敞咧。PAGA圖是通過(guò)將一個(gè)節(jié)點(diǎn)與每個(gè)亞群關(guān)聯(lián)起來(lái)棘捣，并通過(guò)表示亞群之間連接性的統(tǒng)計(jì)度量的加權(quán)邊連接每個(gè)節(jié)點(diǎn)得到休建。通過(guò)丟棄低權(quán)重的假邊乍恐，PAGA圖揭示了數(shù)據(jù)在選定分辨率下的去噪拓?fù)浣Y(jié)構(gòu)测砂，并揭示了其連接和斷開(kāi)的區(qū)域。

好了砌些，我們來(lái)看看scanpy中PAGA是如何實(shí)現(xiàn)的吧呜投，好不好？

首先我們導(dǎo)入我們的數(shù)據(jù)：


import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns 




sc.settings.verbosity = 3             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()
sc.settings.set_figure_params(dpi=80)

scanpy==1.4.5.1 
anndata==0.7.1 
umap==0.3.10 
numpy==1.16.5 
scipy==1.3.1 
pandas==0.25.1 
scikit-learn==0.21.3 
statsmodels==0.10.1 
python-igraph==0.8.0

results_file = 'E:\learnscanpy\write\pbmc3k.h5ad' 
help(sc.read_10x_mtx)
adata = sc.read_10x_mtx(
    'E:/learnscanpy/data/filtered_feature_bc_matrix',  # the directory with the `.mtx` file
    var_names='gene_symbols',                  # use gene symbols for the variable names (variables-axis index)
    cache=True) 
adata.var_names_make_unique()  # this is unnecessary if using `var_names='gene_ids'` in `sc.read_10x_mtx`

adata
Out[282]: 
AnnData object with n_obs × n_vars = 5025 × 33694 
    var: 'gene_ids', 'feature_types'

這里我們使用zheng17的數(shù)據(jù)預(yù)處理方法仑荐，僅僅是因?yàn)楹?jiǎn)單，您也可以像單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析|| scanpy教程：預(yù)處理與聚類一樣自己一步一步地執(zhí)行預(yù)處理纵东。

sc.pp.recipe_zheng17(adata)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=4, n_pcs=20)
sc.tl.leiden(adata)

也許粘招，你會(huì)問(wèn)這到底是一種怎樣的cell QC的過(guò)程偎球，不難洒扎，來(lái)看看它的幫助文檔啊，都i給你寫(xiě)好了呢袍冷？磷醋！

help(sc.pp.recipe_zheng17)

Help on function recipe_zheng17 in module scanpy.preprocessing._recipes:

recipe_zheng17(adata: anndata._core.anndata.AnnData, n_top_genes: int = 1000, log: bool = True, plot: bool = False, copy: bool = False) -> Union[anndata._core.anndata.AnnData, NoneType]
    Normalization and filtering as of [Zheng17]_.
    
    Reproduces the preprocessing of [Zheng17]_ – the Cell Ranger R Kit of 10x
    Genomics.
    
    Expects non-logarithmized data.
    If using logarithmized data, pass `log=False`.
    
    The recipe runs the following steps
    
    .. code:: python
    
        sc.pp.filter_genes(adata, min_counts=1)         # only consider genes with more than 1 count
        sc.pp.normalize_per_cell(                       # normalize with total UMI count per cell
             adata, key_n_counts='n_counts_all'
        )
        filter_result = sc.pp.filter_genes_dispersion(  # select highly-variable genes
            adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False
        )
        adata = adata[:, filter_result.gene_subset]     # subset the genes
        sc.pp.normalize_per_cell(adata)                 # renormalize after filtering
        if log: sc.pp.log1p(adata)                      # log transform: adata.X = log(adata.X + 1)
        sc.pp.scale(adata)                              # scale to unit variance and shift to zero mean
    
    
    Parameters
    ----------
    adata
        Annotated data matrix.
    n_top_genes
        Number of genes to keep.
    log
        Take logarithm.
    plot
        Show a plot of the gene dispersion vs. mean relation.
    copy
        Return a copy of `adata` instead of updating it.
    
    Returns
    -------
    Returns or updates `adata` depending on `copy`.

加個(gè)plot參數(shù)試試难裆，好使！

sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata, color='leiden', legend_loc='on data')

這里你不要問(wèn)下沒(méi)有執(zhí)行降維乃戈，這個(gè)細(xì)胞圖譜是在哪個(gè)空間里的呢褂痰？這就要問(wèn)一下sc.tl.draw_graph了：

adata
Out[292]: 
AnnData object with n_obs × n_vars = 5025 × 1000 
    obs: 'n_counts_all', 'leiden'
    var: 'gene_ids', 'feature_types', 'n_counts'
    uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors'
    obsm: 'X_pca', 'X_draw_graph_fr'
    varm: 'PCs'

help(sc.tl.draw_graph)
Help on function draw_graph in module scanpy.tools._draw_graph:

draw_graph(adata: anndata._core.anndata.AnnData, layout: scanpy._compat.Literal_ = 'fa', init_pos: Union[str, bool, NoneType] = None, root: Union[int, NoneType] = None, random_state: Union[int, mtrand.RandomState, NoneType] = 0, n_jobs: Union[int, NoneType] = None, adjacency: Union[scipy.sparse.base.spmatrix, NoneType] = None, key_added_ext: Union[str, NoneType] = None, copy: bool = False, **kwds)
    Force-directed graph drawing [Islam11]_ [Jacomy14]_ [Chippada18]_.
    
    An alternative to tSNE that often preserves the topology of the data
    better. This requires to run :func:`~scanpy.pp.neighbors`, first.
    
    The default layout ('fa', `ForceAtlas2`) [Jacomy14]_ uses the package |fa2|_
    [Chippada18]_, which can be installed via `pip install fa2`.
    
    `Force-directed graph drawing`_ describes a class of long-established
    algorithms for visualizing graphs.
    It has been suggested for visualizing single-cell data by [Islam11]_.
    Many other layouts as implemented in igraph [Csardi06]_ are available.
    Similar approaches have been used by [Zunder15]_ or [Weinreb17]_.
    
    .. |fa2| replace:: `fa2`
    .. _fa2: https://github.com/bhargavchippada/forceatlas2
    .. _Force-directed graph drawing: https://en.wikipedia.org/wiki/Force-directed_graph_drawing
    
    Parameters
    ----------
    adata
        Annotated data matrix.
    layout
        'fa' (`ForceAtlas2`) or any valid `igraph layout
        <http://igraph.org/c/doc/igraph-Layout.html>`__. Of particular interest
        are 'fr' (Fruchterman Reingold), 'grid_fr' (Grid Fruchterman Reingold,
        faster than 'fr'), 'kk' (Kamadi Kawai', slower than 'fr'), 'lgl' (Large
        Graph, very fast), 'drl' (Distributed Recursive Layout, pretty fast) and
        'rt' (Reingold Tilford tree layout).
    root
        Root for tree layouts.
    random_state
        For layouts with random initialization like 'fr', change this to use
        different intial states for the optimization. If `None`, no seed is set.
    adjacency
        Sparse adjacency matrix of the graph, defaults to
        `adata.uns['neighbors']['connectivities']`.
    key_added_ext
        By default, append `layout`.
    proceed
        Continue computation, starting off with 'X_draw_graph_`layout`'.
    init_pos
        `'paga'`/`True`, `None`/`False`, or any valid 2d-`.obsm` key.
        Use precomputed coordinates for initialization.
        If `False`/`None` (the default), initialize randomly.
    copy
        Return a copy instead of writing to adata.
    **kwds
        Parameters of chosen igraph layout. See e.g. `fruchterman-reingold`_
        [Fruchterman91]_. One of the most important ones is `maxiter`.
    
        .. _fruchterman-reingold: http://igraph.org/python/doc/igraph.Graph-class.html#layout_fruchterman_reingold
    
    Returns
    -------
    Depending on `copy`, returns or updates `adata` with the following field.
    
    **X_draw_graph_layout** : `adata.obsm`
        Coordinates of graph layout. E.g. for layout='fa' (the default),
        the field is called 'X_draw_graph_fa'

這是tsne的一種代替方案症虑，由fa2庫(kù)完成，繪圖用的是Force-directed_graph_drawing谍憔，說(shuō)明文檔中給出了相關(guān)的鏈接匪蝙。

那么我們看看umap的結(jié)構(gòu)是怎么樣的：

sc.tl.umap(adata)
sc.pl.umap(adata, color=['leiden'])

umap都做了，再看看tsne的結(jié)果吧：

sc.tl.tsne(adata)
sc.pl.tsne(adata, color=['leiden'])

果然tsne要慢很多习贫，等了好一會(huì)。苫昌。颤绕。

可選操作：圖形去噪

為了去噪祟身，用擴(kuò)散映射空間來(lái)表示它(而不是PCA空間)奥务。計(jì)算幾個(gè)擴(kuò)散分量?jī)?nèi)的距離相當(dāng)于圖像去噪——我們只取幾個(gè)第一個(gè)光譜分量袜硫。這與使用PCA去噪數(shù)據(jù)矩陣非常相似氯葬。注意婉陷，對(duì)于PAGA帚称、聚類或偽時(shí)間估計(jì)秽澳，這都不是必要的步驟。

sc.tl.diffmap(adata)
computing Diffusion Maps using n_comps=15(=n_dcs)
computing transitions
    finished (0:00:00)
    eigenvalues of transition matrix
    [1.         1.         1.         0.9997476  0.9994019  0.9990253
     0.9986317  0.99379796 0.9936346  0.9925993  0.99070626 0.9898019
     0.98812026 0.9873447  0.9869766 ]
    finished: added
    'X_diffmap', diffmap coordinates (adata.obsm)
    'diffmap_evals', eigenvalues of transition matrix (adata.uns) (0:00:00)

sc.pp.neighbors(adata, n_neighbors=10, use_rep='X_diffmap')
sc.tl.draw_graph(adata)
sc.pl.draw_graph(adata, color='leiden', legend_loc='on data')

雖然還是有些凌亂肝集，但還是清爽了一些的：

Clustering and PAGA

其實(shí)我們已經(jīng)聚類過(guò)了啊瞻坝，不然哪來(lái)的標(biāo)簽?zāi)亍?/p>

# sc.tl.louvain(adata, resolution=1.0)
# sc.tl.leiden(adata)
sc.tl.paga(adata, groups='leiden')
running PAGA
    finished: added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)

adata.var_names
Out[304]: 
Index(['AP006222.2', 'MEGF6', 'RP11-46F15.2', 'GPR153', 'RP5-1113E3.3',
       'RP4-734G22.3', 'MASP2', 'RP3-467K16.2', 'RP4-680D5.2', 'AKR7A3',
       ...
       'MT-CO1', 'MT-CO2', 'MT-ATP6', 'MT-CO3', 'MT-ND3', 'MT-ND4', 'MT-ND5',
       'MT-CYB', 'AC145212.2', 'AC011043.1'],
      dtype='object', length=1000)

adata
Out[308]: 
AnnData object with n_obs × n_vars = 5025 × 1000 
    obs: 'n_counts_all', 'leiden'
    var: 'gene_ids', 'feature_types', 'n_counts'
    uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors', 'umap', 'diffmap_evals', 'paga', 'leiden_sizes'
    obsm: 'X_pca', 'X_draw_graph_fr', 'X_umap', 'X_tsne', 'X_diffmap'
    varm: 'PCs'

sc.pl.paga(adata, color=['leiden', 'MEGF6', 'MT-CO2', 'AKR7A3'])
--> added 'pos', the PAGA positions (adata.uns['paga'])

我們看到一個(gè)框架圖，其實(shí)是一個(gè)無(wú)向的網(wǎng)絡(luò)圖所刀，點(diǎn)的顏色代表不同分群衙荐，點(diǎn)的大小代表該群細(xì)胞數(shù)的大小浮创，連線只有一種顏色忧吟，粗細(xì)代表兩個(gè)群之間連接更密切斩披。這里并沒(méi)有一個(gè)擬時(shí)的概念溜族，軌跡是一種相互關(guān)系垦沉，可以是時(shí)間的，也可以不是厕倍，而往往以人類的直覺(jué)寡壮，時(shí)間都是單向的，但是每個(gè)細(xì)胞都有自己的分化方向讹弯。一個(gè)簡(jiǎn)單點(diǎn)的框架圖，讓我們從新思考細(xì)胞分化這一基本事實(shí)组民！但是下面我們會(huì)在這個(gè)基礎(chǔ)上推斷出一個(gè)擬時(shí)結(jié)構(gòu)棒仍。

adata.obs['leiden'].cat.categories
Out[306]: 
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
       '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',
       '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35'],
      dtype='object')

遇到核心函數(shù)，就算沒(méi)有什么疑問(wèn)也是要看說(shuō)明文檔的敖岛荨：

help(sc.tl.paga)
Help on function paga in module scanpy.tools._paga:

paga(adata: anndata._core.anndata.AnnData, groups: Union[str, NoneType] = None, use_rna_velocity: bool = False, model: scanpy._compat.Literal_ = 'v1.2', copy: bool = False)
    Mapping out the coarse-grained connectivity structures of complex manifolds [Wolf19]_.
    
    By quantifying the connectivity of partitions (groups, clusters) of the
    single-cell graph, partition-based graph abstraction (PAGA) generates a much
    simpler abstracted graph (*PAGA graph*) of partitions, in which edge weights
    represent confidence in the presence of connections. By tresholding this
    confidence in :func:`~scanpy.pl.paga`, a much simpler representation of the
    manifold data is obtained, which is nonetheless faithful to the topology of
    the manifold.
    
    The confidence should be interpreted as the ratio of the actual versus the
    expected value of connetions under the null model of randomly connecting
    partitions. We do not provide a p-value as this null model does not
    precisely capture what one would consider "connected" in real data, hence it
    strongly overestimates the expected value. See an extensive discussion of
    this in [Wolf19]_.
    
    .. note::
        Note that you can use the result of :func:`~scanpy.pl.paga` in
        :func:`~scanpy.tl.umap` and :func:`~scanpy.tl.draw_graph` via
        `init_pos='paga'` to get single-cell embeddings that are typically more
        faithful to the global topology.
    
    Parameters
    ----------
    adata
        An annotated data matrix.
    groups
        Key for categorical in `adata.obs`. You can pass your predefined groups
        by choosing any categorical annotation of observations. Default:
        The first present key of `'leiden'` or `'louvain'`.
    use_rna_velocity
        Use RNA velocity to orient edges in the abstracted graph and estimate
        transitions. Requires that `adata.uns` contains a directed single-cell
        graph with key `['velocity_graph']`. This feature might be subject
        to change in the future.
    model
        The PAGA connectivity model.
    copy
        Copy `adata` before computation and return a copy. Otherwise, perform
        computation inplace and return `None`.
    
    Returns
    -------
    **connectivities** : :class:`numpy.ndarray` (adata.uns['connectivities'])
        The full adjacency matrix of the abstracted graph, weights correspond to
        confidence in the connectivities of partitions.
    **connectivities_tree** : :class:`scipy.sparse.csr_matrix` (adata.uns['connectivities_tree'])
        The adjacency matrix of the tree-like subgraph that best explains
        the topology.
    
    Notes
    -----
    Together with a random walk-based distance measure
    (e.g. :func:`scanpy.tl.dpt`) this generates a partial coordinatization of
    data useful for exploring and explaining its variation.
    
    .. currentmodule:: scanpy
    
    See Also
    --------
    pl.paga
    pl.paga_path
    pl.paga_compare

我們看到use_rna_velocity參數(shù)，scanpy也可以用rna速率數(shù)據(jù)啊庇楞。

假裝我們已經(jīng)做好了細(xì)胞定義：

adata.obs['leiden_anno'] = adata.obs['leiden']
adata.obs['leiden_anno'].cat.categories = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10/Ery', '11', '12',
       '13', '14', '15', '16/Stem', '17', '18', '19/Neu', '20/Mk', '21', '22/Baso', '23', '24/Mo','25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35']
sc.tl.paga(adata, groups='leiden_anno')

sc.pl.paga(adata, threshold=0.03, show=False)

用PAGA的結(jié)果重新計(jì)算嵌入否纬，也就是使得細(xì)胞的結(jié)構(gòu)與PAGA的結(jié)構(gòu)一致：

sc.tl.draw_graph(adata, init_pos='paga')
sc.pl.draw_graph(adata, color=['leiden_anno','MEGF6', 'MT-CO2', 'AKR7A3'], legend_loc='on data')

drawing single-cell graph using layout 'fa'
WARNING: Package 'fa2' is not installed, falling back to layout 'fr'.To use the faster and better ForceAtlas2 layout, install package 'fa2' (`pip install fa2`).

    finished: added
    'X_draw_graph_fr', graph_drawing coordinates (adata.obsm) (0:00:18)

當(dāng)然這個(gè)圖并不是我們期望，看上去依然略顯混亂临燃，根據(jù)waring的提示也許我應(yīng)該pip install fa2,于是我們就pip install fa2,再來(lái)看一下結(jié)果：

forceatlas2將Gephi的Force Atlas 2布局算法移植到python2和python3(帶有NetworkX和igraph的包裝)睛驳。這是最快的python實(shí)現(xiàn)，大部分功能已經(jīng)完成膜廊。它還支持Barnes Hut的最大加速近似值。ForceAtlas2是一種非匙希快速的面向力的圖形布局算法蹬跃。它用于在2D中對(duì)一個(gè)加權(quán)的無(wú)向圖進(jìn)行空間化(邊緣權(quán)值定義了連接的強(qiáng)度)。實(shí)現(xiàn)基于本文和相應(yīng)的gephi-java代碼蝶缀。與networkx的fruchterman reingold算法(spring layout)相比丹喻，它確實(shí)非常快翁都，并且可以很好地?cái)U(kuò)展到大量節(jié)點(diǎn)(>10000)碍论。

回到我們的help(sc.tl.draw_graph)柄慰，看到這layout的選項(xiàng)支持不同的igraph layout主題：

layout
        'fa' (`ForceAtlas2`) or any valid `igraph layout
        <http://igraph.org/c/doc/igraph-Layout.html>`__. Of particular interest
        are 'fr' (Fruchterman Reingold), 'grid_fr' (Grid Fruchterman Reingold,
        faster than 'fr'), 'kk' (Kamadi Kawai', slower than 'fr'), 'lgl' (Large
        Graph, very fast), 'drl' (Distributed Recursive Layout, pretty fast) and
        'rt' (Reingold Tilford tree layout).

sc.tl.draw_graph(adata, init_pos='paga',layout= 'lgl')
sc.pl.draw_graph(adata, color=['leiden_anno','MEGF6', 'MT-CO2', 'AKR7A3'], legend_loc='on data')

sc.tl.draw_graph(adata, init_pos='paga',layout= 'kk')
sc.pl.draw_graph(adata, color=['leiden_anno','MEGF6', 'MT-CO2', 'AKR7A3'], legend_loc='on data')

我們還是用fa吧。我們不禁會(huì)想坐搔，monocle的layout是什么呢藏研，為什么看起來(lái)那么直白？

pl.figure(figsize=(8, 2))
for i in range(28):
    pl.scatter(i, 1, c=sc.pl.palettes.zeileis_28[i], s=200)
pl.show()

zeileis_colors = np.array(sc.pl.palettes.zeileis_28)
new_colors = np.array(adata.uns['leiden_anno_colors'])

new_colors[[16]] = zeileis_colors[[12]]  # Stem colors / green
new_colors[[10, 17, 5, 3, 15, 6, 18, 13, 7, 12]] = zeileis_colors[[5, 5, 5, 5, 11, 11, 10, 9, 21, 21]]  # Ery colors / red
new_colors[[20, 8]] = zeileis_colors[[17, 16]]  # Mk early Ery colors / yellow
new_colors[[4, 0]] = zeileis_colors[[2, 8]]  # lymph progenitors / grey
new_colors[[22]] = zeileis_colors[[18]]  # Baso / turquoise
new_colors[[19, 14, 2]] = zeileis_colors[[6, 6, 6]]  # Neu / light blue
new_colors[[24, 9, 1, 11]] = zeileis_colors[[0, 0, 0, 0]]  # Mo / dark blue
new_colors[[21, 23]] = zeileis_colors[[25, 25]]  # outliers / grey

adata.uns['leiden_anno_colors'] = new_colors
sc.pl.paga_compare(
    adata, threshold=0.03, title='', right_margin=0.2, size=10, edge_width_scale=0.5,
    legend_fontsize=12, fontsize=12, frameon=False, edges=True, save=True)

接下來(lái)我們?cè)谶@個(gè)細(xì)胞圖譜上繪制擬時(shí)信息遥倦，雖說(shuō)是擬時(shí)推斷，卻需要指定一個(gè)亞群作為起點(diǎn)占锯。但是真正的樣本中有時(shí)并不存在一個(gè)明確的起點(diǎn)袒哥，有時(shí)是同時(shí)發(fā)生發(fā)育的消略。盡管每個(gè)生命從長(zhǎng)遠(yuǎn)來(lái)看所有的細(xì)胞都來(lái)自一個(gè)細(xì)胞堡称。這個(gè)起點(diǎn)如何確定艺演，就像牛頓的宇宙第一推動(dòng)者的問(wèn)題一樣却紧，必須跳出數(shù)據(jù)本身來(lái)思考它胎撤。這里我們?yōu)榱送瓿扇蝿?wù)還是選擇一個(gè)發(fā)育的起點(diǎn)吧晓殊。

adata.uns['iroot'] = np.flatnonzero(adata.obs['leiden_anno']  == '16/Stem')[0]
sc.tl.dpt(adata)

adata
Out[332]: 
AnnData object with n_obs × n_vars = 5025 × 1000 
    obs: 'n_counts_all', 'leiden', 'leiden_anno', 'dpt_pseudotime'
    var: 'gene_ids', 'feature_types', 'n_counts'
    uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors', 'umap', 'diffmap_evals', 'paga', 'leiden_sizes', 'leiden_anno_sizes', 'leiden_anno_colors', 'iroot'
    obsm: 'X_pca', 'X_draw_graph_fa', 'X_umap', 'X_tsne', 'X_diffmap', 'X_draw_graph_lgl', 'X_draw_graph_kk'
    varm: 'PCs'

adata.obs['dpt_pseudotime']
Out[333]: 
AAACCCAAGCGTATGG-1    0.661545
AAACCCAGTCCTACAA-1    0.670825
AAACCCATCACCTCAC-1    0.000000
AAACGCTAGGGCATGT-1    0.839952
AAACGCTGTAGGTACG-1    0.816287
  
TTTGTTGCAGGTACGA-1    0.826858
TTTGTTGCAGTCTCTC-1    0.792751
TTTGTTGGTAATTAGG-1    0.329366
TTTGTTGTCCTTGGAA-1    0.905214
TTTGTTGTCGCACGAC-1    0.771131
Name: dpt_pseudotime, Length: 5025, dtype: float32

我們不禁思考伤提，既然每個(gè)生命從長(zhǎng)遠(yuǎn)來(lái)看所有的細(xì)胞都來(lái)自一個(gè)細(xì)胞巫俺，是不是在一套數(shù)據(jù)集中可以設(shè)置一個(gè)遙遠(yuǎn)的點(diǎn)（實(shí)際本不存在的）作為發(fā)育的起點(diǎn)呢肿男？這樣是不是更能反映軌跡推斷的實(shí)際呢介汹？據(jù)我所知好像還沒(méi)有這樣的算法出現(xiàn)舶沛。

sc.tl.draw_graph(adata, init_pos='paga')
sc.pl.draw_graph(adata, color=['leiden_anno', 'dpt_pseudotime'], legend_loc='on data')

可以自定義發(fā)育路徑

paths = [('erythrocytes', [16, 12, 7, 13, 18, 6, 5, 10]),
         ('neutrophils', [16, 0, 4, 2, 14, 19]),
         ('monocytes', [16, 0, 4, 11, 1, 9, 24])]
adata.obs['distance'] = adata.obs['dpt_pseudotime']
adata.obs['clusters'] = adata.obs['leiden_anno'] 
adata.uns['clusters_colors'] = adata.uns['leiden_anno_colors']
gene_names = adata.var_names[1:10]

adata
Out[338]: 
AnnData object with n_obs × n_vars = 5025 × 1000 
    obs: 'n_counts_all', 'leiden', 'leiden_anno', 'dpt_pseudotime', 'distance', 'clusters'
    var: 'gene_ids', 'feature_types', 'n_counts'
    uns: 'log1p', 'pca', 'neighbors', 'leiden', 'draw_graph', 'leiden_colors', 'umap', 'diffmap_evals', 'paga', 'leiden_sizes', 'leiden_anno_sizes', 'leiden_anno_colors', 'iroot', 'clusters_colors'
    obsm: 'X_pca', 'X_draw_graph_fa', 'X_umap', 'X_tsne', 'X_diffmap', 'X_draw_graph_lgl', 'X_draw_graph_kk'
    varm: 'PCs'

_, axs = pl.subplots(ncols=3, figsize=(6, 2.5), gridspec_kw={'wspace': 0.05, 'left': 0.12})
pl.subplots_adjust(left=0.05, right=0.98, top=0.82, bottom=0.2)
for ipath, (descr, path) in enumerate(paths):
    _, data = sc.pl.paga_path(
        adata, path, gene_names,
        show_node_names=False,
        ax=axs[ipath],
        ytick_fontsize=12,
        left_margin=0.15,
        n_avg=50,
        annotations=['distance'],
        show_yticks=True if ipath==0 else False,
        show_colorbar=False,
        color_map='Greys',
        groups_key='clusters',
        color_maps_annotations={'distance': 'viridis'},
        title='{} path'.format(descr),
        return_data=True,
        show=False)
    data.to_csv('./write/paga_path_{}.csv'.format(descr))
pl.savefig('./figures/paga_path_paul15.pdf')
pl.show()

仔細(xì)看看這個(gè)圖有沒(méi)有想起來(lái)monocle的，基因在不同命運(yùn)中的變化情況：

paga

最后編輯于：2020.03.26 07:58:05

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末如庭，一起剝皮案震驚了整個(gè)濱河市叹卷，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌，老刑警劉巖餐胀，帶你破解...
沈念sama閱讀 206,013評(píng)論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異瘤载，居然都是意外死亡否灾，警方通過(guò)查閱死者的電腦和手機(jī)鸣奔，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,205評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門墨技，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)挎狸，“玉大人扣汪，你說(shuō)我怎么就攤上這事锨匆≌副穑” “怎么了？”我有些...
開(kāi)封第一講書(shū)人閱讀 152,370評(píng)論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵茅主，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我土榴，道長(zhǎng)诀姚，這世上最難降的妖魔是什么？我笑而不...
開(kāi)封第一講書(shū)人閱讀 55,168評(píng)論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任赫段，我火速辦了婚禮，結(jié)果婚禮上矢赁，老公的妹妹穿的比我還像新娘糯笙。我一直安慰自己撩银，他們只是感情好给涕，可當(dāng)我...
茶點(diǎn)故事閱讀 64,153評(píng)論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布蜒蕾。她就那樣靜靜地躺著，像睡著了一般焕阿。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上暮屡，一...
開(kāi)封第一講書(shū)人閱讀 48,954評(píng)論 1贊 283
城市分裂傳說(shuō)
那天撤摸，我揣著相機(jī)與錄音，去河邊找鬼。笑死准夷，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的衫嵌。我是一名探鬼主播读宙，決...
沈念sama閱讀 38,271評(píng)論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼楔绞！你這毒婦竟也來(lái)了？” 一聲冷哼從身側(cè)響起酒朵，我...
開(kāi)封第一講書(shū)人閱讀 36,916評(píng)論 0贊 259
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤桦锄，失蹤者是張志新（化名）和其女友劉穎，沒(méi)想到半個(gè)月后蔫耽，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,382評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡匙铡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,877評(píng)論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年图甜，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片具则。...
茶點(diǎn)故事閱讀 37,989評(píng)論 1贊 333
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡，死狀恐怖具帮，靈堂內(nèi)的尸體忽然破棺而出博肋，到底是詐尸還是另有隱情，我是刑警寧澤匪凡，帶...
沈念sama閱讀 33,624評(píng)論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站掘猿，受9級(jí)特大地震影響病游，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜衬衬，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,209評(píng)論 3贊 307
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望改橘。院中可真熱鬧滋尉，春花似錦、人聲如沸飞主。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 30,199評(píng)論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)碾篡。三九已至虱而，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間开泽，已是汗流浹背牡拇。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 31,418評(píng)論 1贊 260
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工眼姐，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留诅迷，地道東北人众旗。一個(gè)月前我還...
沈念sama閱讀 45,401評(píng)論 2贊 352
代替公主和親
正文我出身青樓罢杉，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親滩租。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,700評(píng)論 2贊 345

單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析|| scanpy教程：PAGA軌跡推斷

可選操作：圖形去噪

Clustering and PAGA

推薦閱讀更多精彩內(nèi)容