今天我們來(lái)分享一個(gè)新的10X單細(xì)胞空間聯(lián)合分析的方法----Tangram,一定要注意這個(gè)軟件的優(yōu)勢(shì)娩怎,這里強(qiáng)調(diào)一下蚂子, 根據(jù)染色圖片推斷每個(gè)spot擁有細(xì)胞核的數(shù)量,從而得到每個(gè)spot的細(xì)胞量蚤氏,根據(jù)這個(gè)前提進(jìn)行10X空間數(shù)據(jù)的解卷積分析。
我們首先來(lái)看文獻(xiàn)的內(nèi)容
Squidpy allows analysis of images in spatial omics analysis workflows
我們首先來(lái)掌握一些基礎(chǔ)的知識(shí)
1踊兜、什么是Image Container
The Image Container is an object for microscopy(微觀(guān)) tissue images associated with spatial molecular datasets(可見(jiàn)Image Container是對(duì)圖片和數(shù)據(jù)進(jìn)行聯(lián)合處理的這樣一個(gè)軟件). The object is a thin wrapper of an xarray(xarray軟件).Dataset and provides efficient access to in-memory and on-disk images. On-disk files are loaded lazily using dask through rasterio , meaning content is only read in memory when requested. The object can be saved as a zarr store zarr. This allows handling very large files that do not fit in memory.說(shuō)白了就是圖片處理器。
Image Container is initialised with an in-memory array or a path to an image file on disk. Images are saved with the key layer. If lazy loading is desired, the chunks parameter needs to be specified.
sq.im.ImageContainer ( PATH , layer = < str >, chunks = < int >)
More images layers with the same spatial dimensions x and y like segmentation masks can be added to an existing Image Container.
img.add_img ( PATH , layer_added = < str >)
The Image Container is able to interface with Anndata objects(這個(gè)地方大家應(yīng)該熟悉吧佳恬,scanpy處理單細(xì)胞數(shù)據(jù)就是產(chǎn)生這樣一個(gè)對(duì)象), in order to relate any pixel-level information to the observations stored in Anndata. For instance, it is possible to create a generator that yields image’s crops on-the-fly corresponding to locations of the spots in the image:(這個(gè)地方也就是說(shuō)可以直接讀取anndata對(duì)象中的圖片信息)捏境。
spot_generator = img.generate_spot_crops(adata)
lambda x: ( x for x in spot_generator ) # yields crops at spots location
This of course works for both features computed at crop-level but also at segmentation-object level. For instance, it is possible to get centroids coordinates as well as several features of the segmentation object that overlap with the spot capture area.(這個(gè)地方了解就可以了)。
第二部分我們來(lái)了解一下圖片的處理過(guò)程
(1)Image processing
Before extracting features from microscopy images, the images can be pre-processed. Squidpy implements functions for commonly used preprocessing functions like conversion to gray-scale or smoothing using a gaussian kernel.
sq.im.process ( img , method =" gray ")##這里的圖片就是我們的原始圖片
Implementations are based on the Scikit-image package and allow processing of very large images through tiling the image into smaller crops and processing these.(這個(gè)地方對(duì)圖片進(jìn)行預(yù)處理)毁葱,大家用的時(shí)候注意格式問(wèn)題垫言。
(2)Image segmentation(這個(gè)地方可以理解為圖片的精細(xì)化)
Nuclei segmentation is an important step when analysing microscopy images(重點(diǎn)來(lái)了,每個(gè)spot的nulei數(shù)量的分析倾剿,這個(gè)跟染色有關(guān))筷频。It allows the quantitative analysis of the number of nuclei, their areas, and morphological features.(量化每個(gè)spot的細(xì)胞數(shù)量,獲得區(qū)域和形態(tài)學(xué)的特征)前痘。There are a wide range of approaches for nuclei segmentation, from established techniques like thresholding to modern deep learning-based approaches(這樣的分析方法很多凛捏?,那我也需要多多學(xué)習(xí)了)芹缔。
A difficulty for nuclei segmentation is to distinguish between partially overlapping nuclei.(overlap的核如何識(shí)別坯癣,這個(gè)是個(gè)很重要的問(wèn)題,尤其癌區(qū)域最欠,細(xì)胞小而且密集)示罗。Watershed is a classic algorithm used to separate overlapping objects by treating pixel values as local topology.(處理圖片的像素作為局部的形態(tài)學(xué)特征)。For this, starting from points of lowest intensity, the image is flooded until basins from different starting points meet at the watershed ridge lines.(處理的軟件及方式芝硬,圖片處理的知識(shí)作者知道的也不多)蚜点。
sq.im.segment ( img , method =" watershed ")
其實(shí)這個(gè)地方和stlearn的圖片處理比較相似。
Implementations in Squidpy are based on the original Scikit-image python implementation(圖片處理的軟件是python模塊Scikit-image拌阴,有空大家可以深入學(xué)習(xí)一下)绍绘。
(3)Custom approaches with deep learning(數(shù)據(jù)的深入分割)
Depending on the quality of the data, simple segmentation approaches like watershed might not be appropriate. Nowadays, many complex segmentation algorithms are provided as pre-trained deep learning models, such as Stardist, Splinedist and Cellpose. These models can be easily used within the segmentation function.(這個(gè)地方是對(duì)數(shù)據(jù)的分割,注意這里的數(shù)據(jù)是圖片的信息迟赃,而不是我們測(cè)序的轉(zhuǎn)錄組數(shù)據(jù))脯倒。
sq.im.segment ( img , method = < pre - trained model >)
(4) Image features(圖片的特征)。
Tissue organisation in microscopic images can be analysed with different image features.This filters relevant information from the (high dimensional) images, allowing for easy interpretation and comparison with other features obtained at the same spatial location.(不同圖片相同空間區(qū)域的特征比較)捺氢, Image features are calculated from the tissue image at each location (x, y) where there is transcriptomics information available, resulting in a obs x features features matrix similar to the obs x gene matrix.(類(lèi)似單細(xì)胞矩陣)藻丢。This image feature matrix can then be used in any single-cell analysis workflow, just like the gene matrix.(看來(lái)這部分主要是對(duì)測(cè)序的數(shù)據(jù)進(jìn)行一個(gè)下游的分析)。
The scale and size of the image used to calculate features can be adjusted using the scale and spot_scale parameters. Feature extraction can be parallelized by providing n_jobs.The calculated feature matrix is stored in adata[key] .
sq.im.calculate_image_features ( adata , img , features = < list >, spot_scale = < float > ,
scale = < float > , key_added = < str >)
這個(gè)地方要注意了摄乒,圖片和數(shù)據(jù)開(kāi)始聯(lián)合起來(lái)進(jìn)行分析
Summary features calculate the mean, the standard variation or specific quantiles for a color channel.Similarly, histogram features scan the histogram of a color channel to calculate quantiles according a defined number of bins(一些參數(shù)的作用)悠反。
sq.im.calculate_image_features ( adata , img , features =" summary ")
sq.im.calculate_image_features ( adata , img , features =" histogram ")
后面也介紹了一些據(jù)不數(shù)據(jù)處理的方法残黑,但是已經(jīng)不是我們研究的重點(diǎn)了,看看即可斋否。
2梨水、我們來(lái)看一下文獻(xiàn)的正文部分。
Squidpy implements a pipeline based on Scikit-image for preprocessing and segmenting images, extracting morphological, texture, and deep learning-powered features茵臭。
這個(gè)地方大家不要太輕視疫诽,首先,軟件可以處理熒光染色或者H&E染色的圖片旦委,前處理和分割都是對(duì)圖片進(jìn)行一個(gè)處理奇徒,最后結(jié)合測(cè)序數(shù)據(jù)進(jìn)行一個(gè)特征提取。當(dāng)然這個(gè)地方研究的還不是很深缨硝,仍需要修煉摩钙。
To enable efficient processing of very large images, this pipeline utilises lazy loading, image tiling and multi-processing(處理過(guò)程,前面提到了)查辩。
Features can be extracted from a raw tissue image crop, or Squidpy’s nuclei-segmentation module can be used to extract nuclei counts and nuclei sizes(提取核數(shù)量的分析)胖笛。
For instance, we can leverage segmented nuclei to inform cell-type deconvolution methods such as Tangram(我們今天的重點(diǎn)) or Cell2Location(這個(gè)我之前分享過(guò),文章在10X單細(xì)胞和空間聯(lián)合分析的方法---cell2location,大家對(duì)比著看)宜岛。
接下來(lái)進(jìn)入我們的重中之重
Cell-type deconvolution using Tangram
Mapping single-cell atlases to spatial transcriptomics data is a crucial analysis steps to integrate cell-type annotation across technologies. Information on the number of nuclei under each spot can help cell-type deconvolution methods. (利用每個(gè)spot的核數(shù)量來(lái)進(jìn)行10X單細(xì)胞空間的聯(lián)合分析)长踊。
Tangram ([Biancalani et al., 2020], code) is a cell-type deconvolution method that enables mapping of cell-types to single nuclei under each spot. We will show how to leverage the image container segmentation capabilities, together with Tangram, to map cell types of the mouse cortex from sc-RNA-seq data to Visium data.
代碼部分我們就不全部重復(fù)了,大家根據(jù)自己的需求個(gè)性化設(shè)計(jì)萍倡。
加載模塊,剛才提到的模塊都在范圍之內(nèi)之斯。
import scanpy as sc
import squidpy as sq
import numpy as np
import pandas as pd
from anndata import AnnData
import pathlib
import matplotlib.pyplot as plt
import matplotlib as mpl
import skimage
# import tangram for spatial deconvolution
import tangram as tg
這里我們以示例數(shù)據(jù)為準(zhǔn),這個(gè)地方大家主要看看數(shù)據(jù)里面包含的內(nèi)容
首先是轉(zhuǎn)錄組數(shù)據(jù):
全部的10X空間轉(zhuǎn)錄組數(shù)據(jù)的處理結(jié)果遣铝,注意這里是python版本分析結(jié)果
其次是圖片處理數(shù)據(jù):
注意這里的圖片信息佑刷,如果我們需要分析自己的數(shù)據(jù),需要讀入自己的高清圖片酿炸。
最后是單細(xì)胞數(shù)據(jù)
最重要的就是注釋的結(jié)果瘫絮。
Nuclei segmentation and segmentation features(每個(gè)spot細(xì)胞數(shù)量的分析)
sq.im.process(img=img, layer="image", method="smooth")
sq.im.segment(
img=img,
layer="image_smooth",
method="watershed",
channel=0,
)
可視化
inset_y = 1500
inset_x = 1700
inset_sy = 400
inset_sx = 500
fig, axs = plt.subplots(1, 3, figsize=(30, 10))
sc.pl.spatial(
adata_st, color="cluster", alpha=0.7, frameon=False, show=False, ax=axs[0], title=""
)
axs[0].set_title("Clusters", fontdict={"fontsize": 20})
sf = adata_st.uns["spatial"]["V1_Adult_Mouse_Brain_Coronal_Section_2"]["scalefactors"][
"tissue_hires_scalef"
]
rect = mpl.patches.Rectangle(
(inset_y * sf, inset_x * sf),
width=inset_sx * sf,
height=inset_sy * sf,
ec="yellow",
lw=4,
fill=False,
)
axs[0].add_patch(rect)
axs[0].axes.xaxis.label.set_visible(False)
axs[0].axes.yaxis.label.set_visible(False)
axs[1].imshow(
img["image"][inset_y : inset_y + inset_sy, inset_x : inset_x + inset_sx, 0] / 65536,
interpolation="none",
)
axs[1].grid(False)
axs[1].set_xticks([])
axs[1].set_yticks([])
axs[1].set_title("DAPI", fontdict={"fontsize": 20})
crop = img["segmented_watershed"][
inset_y : inset_y + inset_sy, inset_x : inset_x + inset_sx
].values
crop = skimage.segmentation.relabel_sequential(crop)[0]
cmap = plt.cm.plasma
cmap.set_under(color="black")
axs[2].imshow(crop, interpolation="none", cmap=cmap, vmin=0.001)
axs[2].grid(False)
axs[2].set_xticks([])
axs[2].set_yticks([])
axs[2].set_title("Nucleous segmentation", fontdict={"fontsize": 20})
不知道大家python畫(huà)圖的能力怎么樣
We then need to extract some image features useful for the deconvolution task downstream. Specifically, we will need: - the number of unique segmentation objects (i.e. nuclei) under each spot. - the coordinates of the centroids of the segmentation object.(分析每個(gè)spot里面的細(xì)胞數(shù)量)。
# define image layer to use for segmentation
features_kwargs = {
"segmentation": {
"label_layer": "segmented_watershed",
"props": ["label", "centroid"],
"channels": [1, 2],
}
}
# calculate segmentation features
sq.im.calculate_image_features(
adata_st,
img,
layer="image",
key_added="image_features",
features_kwargs=features_kwargs,
features="segmentation",
mask_circle=True,
)
adata_st.obs["cell_count"] = adata_st.obsm["image_features"]["segmentation_label"]
sc.pl.spatial(adata_st, color=["cluster", "cell_count"], frameon=False)
從而得到每個(gè)spot的細(xì)胞數(shù)量填硕,進(jìn)行精細(xì)化的NMF分析麦萤。
Deconvolution and mapping
At this stage, we have all we need for the deconvolution task. First, we need to find a set of common genes the single cell and spatial datasets. We will use the intersection of the highly variable genes.(提取聯(lián)合分析的基因)
這個(gè)地方根據(jù)自己的需求進(jìn)行分析
sc.tl.rank_genes_groups(adata_sc, groupby="cell_subclass")
markers_df = pd.DataFrame(adata_sc.uns["rank_genes_groups"]["names"]).iloc[0:100, :]
genes_sc = np.unique(markers_df.melt().value.values)
genes_st = adata_st.var_names.values
genes = list(set(genes_sc).intersection(set(genes_st)))
開(kāi)始進(jìn)行解卷積的分析
mapper = tg.mapping_optimizer.MapperConstrained(
S=S,
G=G,
d=d,
device=device,
**hyperparm,
target_count=adata_st.obs.cell_count.sum()
)
我們來(lái)看一下分析的結(jié)果
不知道大家是否喜歡這個(gè)聯(lián)合分析的方法
生活很好,有你更好
分析的網(wǎng)址在squidpy