一年了,我們都需要總結(jié)鹃答,這一篇我們回顧一下SPOTlight乎澄,非常好的方法,建議大家也總結(jié)一下测摔,看看2021年置济,自己都得到了什么。
SPOTlight 的目標是提供一種工具锋八,能夠?qū)Π毎旌衔锏拿總€捕獲位置中存在的細胞類型和細胞類型比例進行解卷積浙于,最初是為 10X 的 Visium - 空間轉(zhuǎn)錄組學技術(shù)開發(fā)的, it can be used for all technologies returning mixtures of cells挟纱。 SPOTlight 基于通過 NMFreg 模型為每種細胞類型查找topics profiles singatures羞酗,并找到最適合我們想要解卷積的spot的組合。
Libraries
library(Matrix)
library(data.table)
library(Seurat)
library(SeuratData)
library(dplyr)
library(gt)
library(SPOTlight)
library(igraph)
library(RColorBrewer)
Load data
Load single-cell reference dataset.
path_to_data <- system.file(package = "SPOTlight")
cortex_sc <- readRDS(glue::glue("{path_to_data}/allen_cortex_dwn.rds"))
Load Spatial data
if (! "stxBrain" %in% SeuratData::AvailableData()[, "Dataset"]) {
# If dataset not downloaded proceed to download it
SeuratData::InstallData("stxBrain")
}
# Load data
anterior <- SeuratData::LoadData("stxBrain", type = "anterior1")
Pre-processing,設置種子的功能大家還知道么紊服?檀轨?
set.seed(123)
cortex_sc <- Seurat::SCTransform(cortex_sc, verbose = FALSE) %>%
Seurat::RunPCA(., verbose = FALSE) %>%
Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)
Visualize the clustering
Seurat::DimPlot(cortex_sc,
group.by = "subclass",
label = TRUE) + Seurat::NoLegend()
Descriptive
cortex_sc@meta.data %>%
dplyr::count(subclass) %>%
gt::gt(.[-1, ]) %>%
gt::tab_header(
title = "Cell types present in the reference dataset",
) %>%
gt::cols_label(
subclass = gt::html("Cell Type")
)
Compute marker genes
為了確定最重要的標記基因,我們可以使用函數(shù) Seurat::FindAllMarkers围苫,它將返回每個cluster的標記裤园。
Seurat::Idents(object = cortex_sc) <- cortex_sc@meta.data$subclass
cluster_markers_all <- Seurat::FindAllMarkers(object = cortex_sc,
assay = "SCT",
slot = "data",
verbose = TRUE,
only.pos = TRUE)
saveRDS(object = cluster_markers_all,
file = here::here("inst/markers_sc.RDS"))
SPOTlight Decomposition
set.seed(123)
spotlight_ls <- spotlight_deconvolution(
se_sc = cortex_sc,
counts_spatial = anterior@assays$Spatial@counts,
clust_vr = "subclass", # Variable in sc_seu containing the cell-type annotation
cluster_markers = cluster_markers_all, # Dataframe with the marker genes
cl_n = 100, # number of cells per cell type to use
hvg = 3000, # Number of HVG to use
ntop = NULL, # How many of the marker genes to use (by default all)
transf = "uv", # Perform unit-variance scaling per cell and spot prior to factorzation and NLS
method = "nsNMF", # Factorization method
min_cont = 0 # Remove those cells contributing to a spot below a certain threshold
)
saveRDS(object = spotlight_ls, file = here::here("inst/spotlight_ls.rds"))
Read RDS object
spotlight_ls <- readRDS(file = here::here("inst/spotlight_ls.rds"))
nmf_mod <- spotlight_ls[[1]]
decon_mtrx <- spotlight_ls[[2]]
Assess deconvolution
Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.
The first thing we can do is look at how specific the topic profiles are for each cell type.
h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
h = h,
train_cell_clust = nmf_mod[[2]])
topic_profile_plts[[2]] + ggplot2::theme(
axis.text.x = ggplot2::element_text(angle = 90),
axis.text = ggplot2::element_text(size = 12))
接下來我們可以看看每個細胞類型中每個細胞的各個topic profiles的行為。
在這里剂府,我們期望來自同一細胞類型的所有細胞顯示出相似的topic profiles分布拧揽,否則該cluster中可能會有更多的子結(jié)構(gòu),我們可能只捕獲其中一個。
topic_profile_plts[[1]] + theme(axis.text.x = element_text(angle = 90),
axis.text = element_text(size = 12))
Lastly we can take a look at which genes are the most important for each topic and therefore get an insight into which genes are driving them.
basis_spotlight <- data.frame(NMF::basis(nmf_mod[[1]]))
colnames(basis_spotlight) <- unique(stringr::str_wrap(nmf_mod[[2]], width = 30))
basis_spotlight %>%
dplyr::arrange(desc(Astro)) %>%
round(., 5) %>%
DT::datatable(., filter = "top")
Visualization
Join decomposition with metadata
# This is the equivalent to setting min_cont to 0.04
decon_mtrx_sub <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx_sub[decon_mtrx_sub < 0.08] <- 0
decon_mtrx <- cbind(decon_mtrx_sub, "res_ss" = decon_mtrx[, "res_ss"])
rownames(decon_mtrx) <- colnames(anterior)
decon_df <- decon_mtrx %>%
data.frame() %>%
tibble::rownames_to_column("barcodes")
anterior@meta.data <- anterior@meta.data %>%
tibble::rownames_to_column("barcodes") %>%
dplyr::left_join(decon_df, by = "barcodes") %>%
tibble::column_to_rownames("barcodes")
Specific cell-types
we can use the standard Seurat::SpatialFeaturePlot to view predicted celltype proportions one at a time.
Seurat::SpatialFeaturePlot(
object = anterior,
features = c("L2.3.IT", "L6b", "Meis2", "Oligo"),
alpha = c(0.1, 1))
Spatial scatterpies
cell_types_all <- colnames(decon_mtrx)[which(colnames(decon_mtrx) != "res_ss")]
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = "sample_data/spatial/tissue_lowres_image.png",
pie_scale = 0.4)
Plot spot composition of spots containing cell-types of interest
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = "sample_data/spatial/tissue_lowres_image.png",
cell_types_interest = "L6b",
pie_scale = 0.8)
Spatial interaction graph
現(xiàn)在我們知道在每個點內(nèi)發(fā)現(xiàn)了哪些細胞類型淤袜,我們可以制作一個表示空間相互作用的圖痒谴,其中細胞類型之間的邊緣越強,我們在同一點內(nèi)發(fā)現(xiàn)它們的頻率越高铡羡。 為此积蔚,我們只需要運行 get_spatial_interaction_graph 函數(shù),該函數(shù)將打印繪圖并返回繪圖所需的元素烦周。
graph_ntw <- SPOTlight::get_spatial_interaction_graph(decon_mtrx = decon_mtrx[, cell_types_all])
If you want to tune how the graph looks you can do the following or you can check out more options here:
deg <- degree(graph_ntw, mode="all")
# Get color palette for difusion
edge_importance <- E(graph_ntw)$importance
# Select a continuous palette
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'seq',]
# Create a color palette
getPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))
# Get how many values we need
grad_edge <- seq(0, max(edge_importance), 0.1)
# Generate extended gradient palette dataframe
graph_col_df <- data.frame(value = as.character(grad_edge),
color = getPalette(length(grad_edge)),
stringsAsFactors = FALSE)
# Assign color to each edge
color_edge <- data.frame(value = as.character(round(edge_importance, 1)), stringsAsFactors = FALSE) %>%
dplyr::left_join(graph_col_df, by = "value") %>%
dplyr::pull(color)
# Open a pdf file
plot(graph_ntw,
# Size of the edge
edge.width = edge_importance,
edge.color = color_edge,
# Size of the buble
vertex.size = deg,
vertex.color = "#cde394",
vertex.frame.color = "white",
vertex.label.color = "black",
vertex.label.family = "Ubuntu", # Font family of the label (e.g.“Times”, “Helvetica”)
layout = layout.circle)
Lastly one can compute cell-cell correlations to see groups of cells that correlate positively or negatively.
# Remove cell types not predicted to be on the tissue
decon_mtrx_sub <- decon_mtrx[, cell_types_all]
decon_mtrx_sub <- decon_mtrx_sub[, colSums(decon_mtrx_sub) > 0]
# Compute correlation
decon_cor <- cor(decon_mtrx_sub)
# Compute correlation P-value
p.mat <- corrplot::cor.mtest(mat = decon_mtrx_sub, conf.level = 0.95)
# Visualize
ggcorrplot::ggcorrplot(
corr = decon_cor,
p.mat = p.mat[[1]],
hc.order = TRUE,
type = "full",
insig = "blank",
lab = TRUE,
outline.col = "lightgrey",
method = "square",
# colors = c("#4477AA", "white", "#BB4444"))
colors = c("#6D9EC1", "white", "#E46726"),
title = "Predicted cell-cell proportion correlation",
legend.title = "Correlation\n(Pearson)") +
ggplot2::theme(
plot.title = ggplot2::element_text(size = 22, hjust = 0.5, face = "bold"),
legend.text = ggplot2::element_text(size = 12),
legend.title = ggplot2::element_text(size = 15),
axis.text.x = ggplot2::element_text(angle = 90),
axis.text = ggplot2::element_text(size = 18, vjust = 0.5))
Step-by-Step insight
Here we are going to show step by step what is going on and all the different steps involved in the process
Downsample data
如果數(shù)據(jù)集非常大尽爆,我們希望在細胞數(shù)量和基因數(shù)量方面對其進行下采樣,以訓練模型读慎。 為了進行下采樣漱贱,我們希望保留每個簇的代表性細胞數(shù)量和最重要的基因。 我們表明這種下采樣不會影響模型的性能并大大加快了模型訓練的速度夭委。
# Downsample scRNAseq to select gene set and number of cells to train the model
se_sc_down <- downsample_se_obj(se_obj = cortex_sc,
clust_vr = "subclass",
cluster_markers = cluster_markers_all,
cl_n = 100,
hvg = 3000)
Train NMF model
Once we have the data ready to pass to the model we can train it as shown below.
start_time <- Sys.time()
nmf_mod_ls <- train_nmf(cluster_markers = cluster_markers_all,
se_sc = se_sc_down,
mtrx_spatial = anterior@assays$Spatial@counts,
clust_vr = "subclass",
ntop = NULL,
hvg = 3000,
transf = "uv",
method = "nsNMF")
nmf_mod <- nmf_mod_ls[[1]]
Extract matrices form the model:
# get basis matrix W
w <- basis(nmf_mod)
dim(w)
# get coefficient matrix H
h <- coef(nmf_mod)
dim(h)
Look at cell-type specific topic profile
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- dot_plot_profiles_fun(
h = h,
train_cell_clust = nmf_mod_ls[[2]]
)
topic_profile_plts[[2]] + theme(axis.text.x = element_text(angle = 90))
Spot Deconvolution
# Extract count matrix
spot_counts <- anterior@assays$Spatial@counts
# Subset to genes used to train the model
spot_counts <- spot_counts[rownames(spot_counts) %in% rownames(w), ]
Run spots through the basis to get the pertinent coefficients. To do this for every spot we are going to set up a system of linear equations where we need to find the coefficient, we will use non-negative least squares to determine the best coefficient fit.
ct_topic_profiles <- topic_profile_per_cluster_nmf(h = h,
train_cell_clust = nmf_mod_ls[[2]])
decon_mtrx <- mixture_deconvolution_nmf(nmf_mod = nmf_mod,
mixture_transcriptome = spot_counts,
transf = "uv",
reference_profiles = ct_topic_profiles,
min_cont = 0.09)
生活很好幅狮,有你更好