在我們之前的分享中,已經(jīng)分享過(guò)很多的單細(xì)胞和空間聯(lián)合分析的方法跺讯,有關(guān)Spotlight枢贿,之前也提到過(guò),但是并沒(méi)有很認(rèn)真的對(duì)關(guān)注這個(gè)方法刀脏,這次的分享局荚,就讓我們來(lái)參透這個(gè)方法;這個(gè)軟件目前已經(jīng)發(fā)表愈污,文獻(xiàn)在SPOTlight:Seeded NMF regression to Deconvolute Spatial Transcriptomics Spots with Single-Cell Transcriptomes,發(fā)表于Nucleic acids research耀态,影響因子11分以上。文章大家可以認(rèn)真研讀一下暂雹,算法也很經(jīng)典首装。
軟件聯(lián)合分析的原理
SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes, and nonnegative least squares (NNLS) to subsequently deconvolute ST capture locations (spots).
其中的幾個(gè)概念我們需要了解;
(1)NMF杭跪,這個(gè)可以參考文章Non-Negative Matrix Factorization (NMF)仙逻。我們這里舉一個(gè)簡(jiǎn)單的例子幫助大家理解:
矩陣分解的想法源于我們相信每個(gè)電影都是由某些元素組成的,而這些元素到底是什么涧尿,有多少系奉,我們是不知道的。
而不同的用戶出于個(gè)人喜好一定會(huì)對(duì)這些元素有個(gè)評(píng)價(jià)姑廉。
比如:
一個(gè)電影里如果只考慮愛(ài)情和動(dòng)作兩個(gè)因素缺亮。
一個(gè)用戶也一定會(huì)有我給愛(ài)情打分多一點(diǎn)還是喜歡動(dòng)作多一點(diǎn)。
如圖:
期望就是我們已知下面的矩陣桥言,而期望分解出比較合理的上面兩個(gè)矩陣萌踱。這樣就可以用來(lái)預(yù)測(cè)別的用戶對(duì)別的電影的看法了。把這個(gè)想法運(yùn)用到我們的空間數(shù)據(jù)上号阿,就可以了虫蝶。
(2) nonnegative least squares (NNLS),非負(fù)最小二乘法倦西,這個(gè)可以參考文章【技術(shù)分享】非負(fù)最小二乘,算法很難能真,了解一下即可。
總而言之一句話,Spotlight利用單細(xì)胞數(shù)據(jù)的特征粉铐,deconvolute ST 數(shù)據(jù)疼约,從而知道每種細(xì)胞類型在空間上的位置。
軟件的優(yōu)勢(shì)
Using synthetic spots, simulating varying reference quantities and qualities, we confirmed high prediction accuracy also with shallowly sequenced or small-sized scRNA-seq reference datasets(準(zhǔn)確度高蝙泼,文獻(xiàn)肯定都這么寫)程剥。We trained the NMF regression model with sample-matched or external datasets, resulting in accurate and sensitive spatial predictions。還有一點(diǎn)汤踏,Using SPOTlight to detect regional enrichment of immune cells and their co-localization with tumor and adjacent stroma provides an illustrative example in its flexible application spectrum and future potential in digital pathology.(免疫細(xì)胞和腫瘤細(xì)胞的共定位問(wèn)題织鲸,值得關(guān)注)。
(重點(diǎn))Successful integration of both data modalities could enable an in-depth study of tissue and organ architecture, elucidate cellular cross-talk, spatially track dynamic cell trajectories, and identify disease-specific interaction networks 溪胶。Intersecting cell-type-specific genes from scRNA-seq with ST capture sites previously identified local enrichments, sufficient to segment tumor sections into normal and cancerous areas.(這個(gè)優(yōu)點(diǎn)大家要多注意下搂擦,單細(xì)胞分析得到的通訊啊,軌跡啊哗脖,在空間上又是什么狀態(tài)呢瀑踢?)
技術(shù)示意圖
我們來(lái)看一下這個(gè)軟件對(duì)空間數(shù)據(jù)分解的過(guò)程:
1、最核心的部分才避,identify cell type-specific topic profiles used to deconvolute ST spots橱夭。
這里多說(shuō)一句,第一步的做法相當(dāng)于將我們的表達(dá)矩陣拆分成了兩個(gè)矩陣桑逝,W和H棘劣。(理論部分 NMF(Non-negative matrix factorization),即對(duì)于任意給定的一個(gè)非負(fù)矩陣V楞遏,其能夠?qū)ふ业揭粋€(gè)非負(fù)矩陣W和一個(gè)非負(fù)矩陣H茬暇,滿足條件V=WH,從而將一個(gè)非負(fù)的矩陣分解為左右兩個(gè)非負(fù)矩陣的乘積。*其中橱健,V矩陣中每一列代表一個(gè)觀測(cè)(observation),每一行代表一個(gè)特征(feature)沙廉;W矩陣稱為基矩陣拘荡,H矩陣稱為系數(shù)矩陣或權(quán)重矩陣。這時(shí)用系數(shù)矩陣H代替原始矩陣撬陵,就可以實(shí)現(xiàn)對(duì)原始矩陣進(jìn)行降維珊皿,得到數(shù)據(jù)特征的降維矩陣。)We set out to use NMF to obtain topic profiles due to its previous success in identifying biologically relevant gene expression programs, as well as its previous implementation in ST analysis巨税。這一步大家著重理解一下蟋定,關(guān)于NMF。Importantly, the NMF is initialized by the two main matrices: the basis matrix (W) with unique cell type marker genes and weights, and the coefficient matrix (H) in which each row is initialized, specifying the corresponding relationship of a cell to a topic草添。
2驶兜、Factorization is then carried out using non-smooth NMF(因式分解)。promoting cell type-specific topic profiles, while reducing overfitting during training。
3抄淑、NNLS regression is used to map each spot’s transcriptome to a topic profile distribution using the unit-variance normalized ST count matrix and the basis matrix previously obtained屠凶。回歸分析肆资,注意這里用到的矩陣關(guān)系矗愧。
4、Lastly, NNLS is again applied to determine the weights for each cell type that best fit each spot’s topic profile by minimizing the residuals.
We use a minimum weight contribution threshold to determine which cell types are contributing to the profile of a given spot, also considering the possibility of partial contributions. NNLS also returns a measure of error along with the predicted cell proportions, allowing the user to estimate the reliability of predicted spot compositions
這個(gè)地方大家需要注意的是郑原,矩陣的分解唉韭,以及分別進(jìn)行回歸,最后一起預(yù)測(cè)細(xì)胞類型的組成犯犁。
接下來(lái)就是一些運(yùn)用了属愤,包括文獻(xiàn)本身的驗(yàn)證和空間數(shù)據(jù)上的運(yùn)用,大家看看即可栖秕, 肯定效果很不錯(cuò)春塌,要不然發(fā)不出來(lái),簇捍。
最后我們來(lái)看一下代碼部分:
SPOTlight
這就不照抄別人的代碼了只壳,我們只關(guān)注重點(diǎn):spotlight_deconvolution
關(guān)于這個(gè)函數(shù)有幾個(gè)參數(shù)必須要注意:
1、cluster_markers 注意這里需要的marker單細(xì)胞分析得到的暑塑,但是用多少吼句,需要判斷。
2事格、cl_n Object of integer indicating how many cells to keep from each cluster. If a cluster has n < cl_n then all cells will be selected, if it has more then cl_n will be sampled randomly,100 by default.
3惕艳、method Object of class character; Type of method to us to find W and H. Look at NMF package for the options and specifications, by default nsNMF(最重要的一步,因式分解的方法)驹愚。
等等等等
代碼部分
The goal of SPOTlight is to provide a tool that enables the deconvolution of cell types and cell type proportions present within each capture locations comprising mixtures of cells, originally developed for 10X’s Visium - spatial trancsiptomics- technology, it can be used for all technologies returning mixtures of cells. SPOTlight is based on finding topic profile signatures, by means of an NMFreg model, for each cell type and finding which combination fits best the spot we want to deconvolute.
Graphical abstract
0.1 Installation
You can install the latest stable version from the GitHub repository SPOTlight with:
# install.packages("devtools")
devtools::install_github("https://github.com/MarcElosua/SPOTlight")
devtools::install_git("https://github.com/MarcElosua/SPOTlight")
Or the latest version in development by downloading the devel branch
devtools::install_github("https://github.com/MarcElosua/SPOTlight", ref = "devel")
devtools::install_git("https://github.com/MarcElosua/SPOTlight", ref = "devel")
0.2 Libraries
library(Matrix)
library(data.table)
library(Seurat)
library(SeuratData)
library(dplyr)
library(gt)
library(SPOTlight)
library(igraph)
library(RColorBrewer)
0.3 Load data
For the purpose of this tutorial we are going to use adult mouse brain data. The scRNAseq data can be downloaded here while the spatial data is the one put out publicly by 10X and the processed object can be downloaded using SeuratData as shown below.
Load single-cell reference dataset.
path_to_data <- system.file(package = "SPOTlight")
cortex_sc <- readRDS(glue::glue("{path_to_data}/allen_cortex_dwn.rds"))
Load Spatial data
if (! "stxBrain" %in% SeuratData::AvailableData()[, "Dataset"]) {
# If dataset not downloaded proceed to download it
SeuratData::InstallData("stxBrain")
}
# Load data
anterior <- SeuratData::LoadData("stxBrain", type = "anterior1")
0.4 Pre-processing
set.seed(123)
cortex_sc <- Seurat::SCTransform(cortex_sc, verbose = FALSE) %>%
Seurat::RunPCA(., verbose = FALSE) %>%
Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)
Visualize the clustering
Seurat::DimPlot(cortex_sc,
group.by = "subclass",
label = TRUE) + Seurat::NoLegend()
0.5 Descriptive
cortex_sc@meta.data %>%
dplyr::count(subclass) %>%
gt::gt(.[-1, ]) %>%
gt::tab_header(
title = "Cell types present in the reference dataset",
) %>%
gt::cols_label(
subclass = gt::html("Cell Type")
)
Cell types present in the reference dataset | |
---|---|
--- | |
Cell Type | n |
--- | --- |
Astro | 70 |
CR | 7 |
Endo | 70 |
L2/3 IT | 70 |
L4 | 70 |
L5 IT | 70 |
L5 PT | 70 |
L6 CT | 70 |
L6 IT | 70 |
L6b | 70 |
Lamp5 | 70 |
Macrophage | 51 |
Meis2 | 45 |
NP | 70 |
Oligo | 70 |
Peri | 32 |
Pvalb | 70 |
Serpinf1 | 27 |
SMC | 55 |
Sncg | 70 |
Sst | 70 |
Vip | 70 |
VLMC | 67 |
0.6 Compute marker genes
To determine the most important marker genes we can use the function Seurat::FindAllMarkers
which will return the markers for each cluster.
Seurat::Idents(object = cortex_sc) <- cortex_sc@meta.data$subclass
cluster_markers_all <- Seurat::FindAllMarkers(object = cortex_sc,
assay = "SCT",
slot = "data",
verbose = TRUE,
only.pos = TRUE)
saveRDS(object = cluster_markers_all,
file = here::here("inst/markers_sc.RDS"))
0.6.1 SPOTlight Decomposition
set.seed(123)
spotlight_ls <- spotlight_deconvolution(
se_sc = cortex_sc,
counts_spatial = anterior@assays$Spatial@counts,
clust_vr = "subclass", # Variable in sc_seu containing the cell-type annotation
cluster_markers = cluster_markers_all, # Dataframe with the marker genes
cl_n = 100, # number of cells per cell type to use
hvg = 3000, # Number of HVG to use
ntop = NULL, # How many of the marker genes to use (by default all)
transf = "uv", # Perform unit-variance scaling per cell and spot prior to factorzation and NLS
method = "nsNMF", # Factorization method
min_cont = 0 # Remove those cells contributing to a spot below a certain threshold
)
saveRDS(object = spotlight_ls, file = here::here("inst/spotlight_ls.rds"))
Read RDS object
spotlight_ls <- readRDS(file = here::here("inst/spotlight_ls.rds"))
nmf_mod <- spotlight_ls[[1]]
decon_mtrx <- spotlight_ls[[2]]
0.6.2 Assess deconvolution
Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.
The first thing we can do is look at how specific the topic profiles are for each cell type.
h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
h = h,
train_cell_clust = nmf_mod[[2]])
topic_profile_plts[[2]] + ggplot2::theme(
axis.text.x = ggplot2::element_text(angle = 90),
axis.text = ggplot2::element_text(size = 12))
0.7 Visualization
Join decomposition with metadata
# This is the equivalent to setting min_cont to 0.04
decon_mtrx_sub <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx_sub[decon_mtrx_sub < 0.08] <- 0
decon_mtrx <- cbind(decon_mtrx_sub, "res_ss" = decon_mtrx[, "res_ss"])
rownames(decon_mtrx) <- colnames(anterior)
decon_df <- decon_mtrx %>%
data.frame() %>%
tibble::rownames_to_column("barcodes")
anterior@meta.data <- anterior@meta.data %>%
tibble::rownames_to_column("barcodes") %>%
dplyr::left_join(decon_df, by = "barcodes") %>%
tibble::column_to_rownames("barcodes")
0.7.1 Specific cell-types
we can use the standard Seurat::SpatialFeaturePlot
to view predicted celltype proportions one at a time.
Seurat::SpatialFeaturePlot(
object = anterior,
features = c("L2.3.IT", "L6b", "Meis2", "Oligo"),
alpha = c(0.1, 1))
0.7.2 Spatial scatterpies
Plot spot composition of all the spots.
cell_types_all <- colnames(decon_mtrx)[which(colnames(decon_mtrx) != "res_ss")]
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = "sample_data/spatial/tissue_lowres_image.png",
pie_scale = 0.4)
Plot spot composition of spots containing cell-types of interest
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = "sample_data/spatial/tissue_lowres_image.png",
cell_types_interest = "L6b",
pie_scale = 0.8)
0.7.3 Spatial interaction graph
Now that we know which cell types are found within each spot we can make a graph representing spatial interactions where cell types will have stronger edges between them the more often we find them within the same spot. To do this we will only need to run the function get_spatial_interaction_graph
, this function prints the plot and returns the elements needed to plot it.
graph_ntw <- SPOTlight::get_spatial_interaction_graph(decon_mtrx = decon_mtrx[, cell_types_all])
If you want to tune how the graph looks you can do the following or you can check out more options here:
deg <- degree(graph_ntw, mode="all")
# Get color palette for difusion
edge_importance <- E(graph_ntw)$importance
# Select a continuous palette
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'seq',]
# Create a color palette
getPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))
# Get how many values we need
grad_edge <- seq(0, max(edge_importance), 0.1)
# Generate extended gradient palette dataframe
graph_col_df <- data.frame(value = as.character(grad_edge),
color = getPalette(length(grad_edge)),
stringsAsFactors = FALSE)
# Assign color to each edge
color_edge <- data.frame(value = as.character(round(edge_importance, 1)), stringsAsFactors = FALSE) %>%
dplyr::left_join(graph_col_df, by = "value") %>%
dplyr::pull(color)
# Open a pdf file
plot(graph_ntw,
# Size of the edge
edge.width = edge_importance,
edge.color = color_edge,
# Size of the buble
vertex.size = deg,
vertex.color = "#cde394",
vertex.frame.color = "white",
vertex.label.color = "black",
vertex.label.family = "Ubuntu", # Font family of the label (e.g.“Times”, “Helvetica”)
layout = layout.circle)
Lastly one can compute cell-cell correlations to see groups of cells that correlate positively or negatively.
# Remove cell types not predicted to be on the tissue
decon_mtrx_sub <- decon_mtrx[, cell_types_all]
decon_mtrx_sub <- decon_mtrx_sub[, colSums(decon_mtrx_sub) > 0]
# Compute correlation
decon_cor <- cor(decon_mtrx_sub)
# Compute correlation P-value
p.mat <- corrplot::cor.mtest(mat = decon_mtrx_sub, conf.level = 0.95)
# Visualize
ggcorrplot::ggcorrplot(
corr = decon_cor,
p.mat = p.mat[[1]],
hc.order = TRUE,
type = "full",
insig = "blank",
lab = TRUE,
outline.col = "lightgrey",
method = "square",
# colors = c("#4477AA", "white", "#BB4444"))
colors = c("#6D9EC1", "white", "#E46726"),
title = "Predicted cell-cell proportion correlation",
legend.title = "Correlation\n(Pearson)") +
ggplot2::theme(
plot.title = ggplot2::element_text(size = 22, hjust = 0.5, face = "bold"),
legend.text = ggplot2::element_text(size = 12),
legend.title = ggplot2::element_text(size = 15),
axis.text.x = ggplot2::element_text(angle = 90),
axis.text = ggplot2::element_text(size = 18, vjust = 0.5))
0.8 Step-by-Step insight
Here we are going to show step by step what is going on and all the different steps involved in the process.
還是那句話诚撵,生活很好占锯,有你更好