10X單細(xì)胞空間聯(lián)合分析之三----Spotlight

在我們之前的分享中,已經(jīng)分享過(guò)很多的單細(xì)胞和空間聯(lián)合分析的方法跺讯,有關(guān)Spotlight枢贿,之前也提到過(guò),但是并沒(méi)有很認(rèn)真的對(duì)關(guān)注這個(gè)方法刀脏,這次的分享局荚,就讓我們來(lái)參透這個(gè)方法;這個(gè)軟件目前已經(jīng)發(fā)表愈污,文獻(xiàn)在SPOTlight:Seeded NMF regression to Deconvolute Spatial Transcriptomics Spots with Single-Cell Transcriptomes,發(fā)表于Nucleic acids research耀态,影響因子11分以上。文章大家可以認(rèn)真研讀一下暂雹,算法也很經(jīng)典首装。

軟件聯(lián)合分析的原理

SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes, and nonnegative least squares (NNLS) to subsequently deconvolute ST capture locations (spots).
其中的幾個(gè)概念我們需要了解;
(1)NMF杭跪,這個(gè)可以參考文章Non-Negative Matrix Factorization (NMF)仙逻。我們這里舉一個(gè)簡(jiǎn)單的例子幫助大家理解:
矩陣分解的想法源于我們相信每個(gè)電影都是由某些元素組成的,而這些元素到底是什么涧尿,有多少系奉,我們是不知道的。
而不同的用戶出于個(gè)人喜好一定會(huì)對(duì)這些元素有個(gè)評(píng)價(jià)姑廉。
比如:
一個(gè)電影里如果只考慮愛(ài)情和動(dòng)作兩個(gè)因素缺亮。
一個(gè)用戶也一定會(huì)有我給愛(ài)情打分多一點(diǎn)還是喜歡動(dòng)作多一點(diǎn)。
如圖:

圖片.png

期望就是我們已知下面的矩陣桥言,而期望分解出比較合理的上面兩個(gè)矩陣萌踱。這樣就可以用來(lái)預(yù)測(cè)別的用戶對(duì)別的電影的看法了。把這個(gè)想法運(yùn)用到我們的空間數(shù)據(jù)上号阿,就可以了虫蝶。
(2) nonnegative least squares (NNLS),非負(fù)最小二乘法倦西,這個(gè)可以參考文章【技術(shù)分享】非負(fù)最小二乘,算法很難能真,了解一下即可。

總而言之一句話,Spotlight利用單細(xì)胞數(shù)據(jù)的特征粉铐,deconvolute ST 數(shù)據(jù)疼约,從而知道每種細(xì)胞類型在空間上的位置。

軟件的優(yōu)勢(shì)

Using synthetic spots, simulating varying reference quantities and qualities, we confirmed high prediction accuracy also with shallowly sequenced or small-sized scRNA-seq reference datasets(準(zhǔn)確度高蝙泼,文獻(xiàn)肯定都這么寫)程剥。We trained the NMF regression model with sample-matched or external datasets, resulting in accurate and sensitive spatial predictions。還有一點(diǎn)汤踏,Using SPOTlight to detect regional enrichment of immune cells and their co-localization with tumor and adjacent stroma provides an illustrative example in its flexible application spectrum and future potential in digital pathology.(免疫細(xì)胞和腫瘤細(xì)胞的共定位問(wèn)題织鲸,值得關(guān)注)。

(重點(diǎn))Successful integration of both data modalities could enable an in-depth study of tissue and organ architecture, elucidate cellular cross-talk, spatially track dynamic cell trajectories, and identify disease-specific interaction networks 溪胶。Intersecting cell-type-specific genes from scRNA-seq with ST capture sites previously identified local enrichments, sufficient to segment tumor sections into normal and cancerous areas.(這個(gè)優(yōu)點(diǎn)大家要多注意下搂擦,單細(xì)胞分析得到的通訊啊,軌跡啊哗脖,在空間上又是什么狀態(tài)呢瀑踢?)

技術(shù)示意圖

圖片.png

我們來(lái)看一下這個(gè)軟件對(duì)空間數(shù)據(jù)分解的過(guò)程:
1、最核心的部分才避,identify cell type-specific topic profiles used to deconvolute ST spots橱夭。
這里多說(shuō)一句,第一步的做法相當(dāng)于將我們的表達(dá)矩陣拆分成了兩個(gè)矩陣桑逝,W和H棘劣。(理論部分 NMF(Non-negative matrix factorization),即對(duì)于任意給定的一個(gè)非負(fù)矩陣V楞遏,其能夠?qū)ふ业揭粋€(gè)非負(fù)矩陣W和一個(gè)非負(fù)矩陣H茬暇,滿足條件V=WH,從而將一個(gè)非負(fù)的矩陣分解為左右兩個(gè)非負(fù)矩陣的乘積。*其中橱健,V矩陣中每一列代表一個(gè)觀測(cè)(observation),每一行代表一個(gè)特征(feature)沙廉;W矩陣稱為基矩陣拘荡,H矩陣稱為系數(shù)矩陣或權(quán)重矩陣。這時(shí)用系數(shù)矩陣H代替原始矩陣撬陵,就可以實(shí)現(xiàn)對(duì)原始矩陣進(jìn)行降維珊皿,得到數(shù)據(jù)特征的降維矩陣。)We set out to use NMF to obtain topic profiles due to its previous success in identifying biologically relevant gene expression programs, as well as its previous implementation in ST analysis巨税。這一步大家著重理解一下蟋定,關(guān)于NMF。Importantly, the NMF is initialized by the two main matrices: the basis matrix (W) with unique cell type marker genes and weights, and the coefficient matrix (H) in which each row is initialized, specifying the corresponding relationship of a cell to a topic草添。
2驶兜、Factorization is then carried out using non-smooth NMF(因式分解)。promoting cell type-specific topic profiles, while reducing overfitting during training。
3抄淑、NNLS regression is used to map each spot’s transcriptome to a topic profile distribution using the unit-variance normalized ST count matrix and the basis matrix previously obtained屠凶。回歸分析肆资,注意這里用到的矩陣關(guān)系矗愧。
4、Lastly, NNLS is again applied to determine the weights for each cell type that best fit each spot’s topic profile by minimizing the residuals.
We use a minimum weight contribution threshold to determine which cell types are contributing to the profile of a given spot, also considering the possibility of partial contributions. NNLS also returns a measure of error along with the predicted cell proportions, allowing the user to estimate the reliability of predicted spot compositions

這個(gè)地方大家需要注意的是郑原,矩陣的分解唉韭,以及分別進(jìn)行回歸,最后一起預(yù)測(cè)細(xì)胞類型的組成犯犁。

接下來(lái)就是一些運(yùn)用了属愤,包括文獻(xiàn)本身的驗(yàn)證和空間數(shù)據(jù)上的運(yùn)用,大家看看即可栖秕, 肯定效果很不錯(cuò)春塌,要不然發(fā)不出來(lái),簇捍。

圖片.png
圖片.png

最后我們來(lái)看一下代碼部分:

SPOTlight
這就不照抄別人的代碼了只壳,我們只關(guān)注重點(diǎn):spotlight_deconvolution
關(guān)于這個(gè)函數(shù)有幾個(gè)參數(shù)必須要注意:
1、cluster_markers 注意這里需要的marker單細(xì)胞分析得到的暑塑,但是用多少吼句,需要判斷。
2事格、cl_n Object of integer indicating how many cells to keep from each cluster. If a cluster has n < cl_n then all cells will be selected, if it has more then cl_n will be sampled randomly,100 by default.
3惕艳、method Object of class character; Type of method to us to find W and H. Look at NMF package for the options and specifications, by default nsNMF(最重要的一步,因式分解的方法)驹愚。

等等等等

代碼部分

The goal of SPOTlight is to provide a tool that enables the deconvolution of cell types and cell type proportions present within each capture locations comprising mixtures of cells, originally developed for 10X’s Visium - spatial trancsiptomics- technology, it can be used for all technologies returning mixtures of cells. SPOTlight is based on finding topic profile signatures, by means of an NMFreg model, for each cell type and finding which combination fits best the spot we want to deconvolute.

image.png

Graphical abstract

0.1 Installation

You can install the latest stable version from the GitHub repository SPOTlight with:

# install.packages("devtools")
devtools::install_github("https://github.com/MarcElosua/SPOTlight")
devtools::install_git("https://github.com/MarcElosua/SPOTlight")

Or the latest version in development by downloading the devel branch

devtools::install_github("https://github.com/MarcElosua/SPOTlight", ref = "devel")
devtools::install_git("https://github.com/MarcElosua/SPOTlight", ref = "devel")

0.2 Libraries

library(Matrix)
library(data.table)
library(Seurat)
library(SeuratData)
library(dplyr)
library(gt)
library(SPOTlight)
library(igraph)
library(RColorBrewer)

0.3 Load data

For the purpose of this tutorial we are going to use adult mouse brain data. The scRNAseq data can be downloaded here while the spatial data is the one put out publicly by 10X and the processed object can be downloaded using SeuratData as shown below.

Load single-cell reference dataset.

path_to_data <- system.file(package = "SPOTlight")
cortex_sc <- readRDS(glue::glue("{path_to_data}/allen_cortex_dwn.rds"))

Load Spatial data

if (! "stxBrain" %in% SeuratData::AvailableData()[, "Dataset"]) {
  # If dataset not downloaded proceed to download it
  SeuratData::InstallData("stxBrain")
}

# Load data
anterior <- SeuratData::LoadData("stxBrain", type = "anterior1")

0.4 Pre-processing

set.seed(123)
cortex_sc <- Seurat::SCTransform(cortex_sc, verbose = FALSE) %>%
  Seurat::RunPCA(., verbose = FALSE) %>%
  Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)

Visualize the clustering

Seurat::DimPlot(cortex_sc,
                group.by = "subclass",
                label = TRUE) + Seurat::NoLegend()
image.png

0.5 Descriptive

cortex_sc@meta.data %>%
  dplyr::count(subclass) %>%
  gt::gt(.[-1, ]) %>%
  gt::tab_header(
    title = "Cell types present in the reference dataset",
  ) %>%
  gt::cols_label(
    subclass = gt::html("Cell Type")
  )
Cell types present in the reference dataset
---
Cell Type n
--- ---
Astro 70
CR 7
Endo 70
L2/3 IT 70
L4 70
L5 IT 70
L5 PT 70
L6 CT 70
L6 IT 70
L6b 70
Lamp5 70
Macrophage 51
Meis2 45
NP 70
Oligo 70
Peri 32
Pvalb 70
Serpinf1 27
SMC 55
Sncg 70
Sst 70
Vip 70
VLMC 67

0.6 Compute marker genes

To determine the most important marker genes we can use the function Seurat::FindAllMarkers which will return the markers for each cluster.

Seurat::Idents(object = cortex_sc) <- cortex_sc@meta.data$subclass
cluster_markers_all <- Seurat::FindAllMarkers(object = cortex_sc, 
                                              assay = "SCT",
                                              slot = "data",
                                              verbose = TRUE, 
                                              only.pos = TRUE)

saveRDS(object = cluster_markers_all,
        file = here::here("inst/markers_sc.RDS"))

0.6.1 SPOTlight Decomposition

set.seed(123)

spotlight_ls <- spotlight_deconvolution(
  se_sc = cortex_sc,
  counts_spatial = anterior@assays$Spatial@counts,
  clust_vr = "subclass", # Variable in sc_seu containing the cell-type annotation
  cluster_markers = cluster_markers_all, # Dataframe with the marker genes
  cl_n = 100, # number of cells per cell type to use
  hvg = 3000, # Number of HVG to use
  ntop = NULL, # How many of the marker genes to use (by default all)
  transf = "uv", # Perform unit-variance scaling per cell and spot prior to factorzation and NLS
  method = "nsNMF", # Factorization method
  min_cont = 0 # Remove those cells contributing to a spot below a certain threshold 
  )

saveRDS(object = spotlight_ls, file = here::here("inst/spotlight_ls.rds"))

Read RDS object

spotlight_ls <- readRDS(file = here::here("inst/spotlight_ls.rds"))

nmf_mod <- spotlight_ls[[1]]
decon_mtrx <- spotlight_ls[[2]]

0.6.2 Assess deconvolution

Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.

The first thing we can do is look at how specific the topic profiles are for each cell type.

h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
  h = h,
  train_cell_clust = nmf_mod[[2]])

topic_profile_plts[[2]] + ggplot2::theme(
  axis.text.x = ggplot2::element_text(angle = 90), 
  axis.text = ggplot2::element_text(size = 12))
image.png

0.7 Visualization

Join decomposition with metadata

# This is the equivalent to setting min_cont to 0.04
decon_mtrx_sub <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx_sub[decon_mtrx_sub < 0.08] <- 0
decon_mtrx <- cbind(decon_mtrx_sub, "res_ss" = decon_mtrx[, "res_ss"])
rownames(decon_mtrx) <- colnames(anterior)

decon_df <- decon_mtrx %>%
  data.frame() %>%
  tibble::rownames_to_column("barcodes")

anterior@meta.data <- anterior@meta.data %>%
  tibble::rownames_to_column("barcodes") %>%
  dplyr::left_join(decon_df, by = "barcodes") %>%
  tibble::column_to_rownames("barcodes")

0.7.1 Specific cell-types

we can use the standard Seurat::SpatialFeaturePlot to view predicted celltype proportions one at a time.

Seurat::SpatialFeaturePlot(
  object = anterior,
  features = c("L2.3.IT", "L6b", "Meis2", "Oligo"),
  alpha = c(0.1, 1))
圖片.png

0.7.2 Spatial scatterpies

Plot spot composition of all the spots.

cell_types_all <- colnames(decon_mtrx)[which(colnames(decon_mtrx) != "res_ss")]

SPOTlight::spatial_scatterpie(se_obj = anterior,
                              cell_types_all = cell_types_all,
                              img_path = "sample_data/spatial/tissue_lowres_image.png",
                              pie_scale = 0.4)
image.png

Plot spot composition of spots containing cell-types of interest

SPOTlight::spatial_scatterpie(se_obj = anterior,
                              cell_types_all = cell_types_all,
                              img_path = "sample_data/spatial/tissue_lowres_image.png",
                              cell_types_interest = "L6b",
                              pie_scale = 0.8)
image.png

0.7.3 Spatial interaction graph

Now that we know which cell types are found within each spot we can make a graph representing spatial interactions where cell types will have stronger edges between them the more often we find them within the same spot. To do this we will only need to run the function get_spatial_interaction_graph, this function prints the plot and returns the elements needed to plot it.

graph_ntw <- SPOTlight::get_spatial_interaction_graph(decon_mtrx = decon_mtrx[, cell_types_all])

If you want to tune how the graph looks you can do the following or you can check out more options here:

deg <- degree(graph_ntw, mode="all")

# Get color palette for difusion
edge_importance <- E(graph_ntw)$importance

# Select a continuous palette
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'seq',]

# Create a color palette
getPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))

# Get how many values we need
grad_edge <- seq(0, max(edge_importance), 0.1)

# Generate extended gradient palette dataframe
graph_col_df <- data.frame(value = as.character(grad_edge),
                           color = getPalette(length(grad_edge)),
                           stringsAsFactors = FALSE)
# Assign color to each edge
color_edge <- data.frame(value = as.character(round(edge_importance, 1)), stringsAsFactors = FALSE) %>%
  dplyr::left_join(graph_col_df, by = "value") %>%
  dplyr::pull(color)

# Open a pdf file
plot(graph_ntw,
     # Size of the edge
     edge.width = edge_importance,
     edge.color = color_edge,
     # Size of the buble
     vertex.size = deg,
     vertex.color = "#cde394",
     vertex.frame.color = "white",
     vertex.label.color = "black",
     vertex.label.family = "Ubuntu", # Font family of the label (e.g.“Times”, “Helvetica”)
     layout = layout.circle)
image.png

Lastly one can compute cell-cell correlations to see groups of cells that correlate positively or negatively.

# Remove cell types not predicted to be on the tissue
decon_mtrx_sub <- decon_mtrx[, cell_types_all]
decon_mtrx_sub <- decon_mtrx_sub[, colSums(decon_mtrx_sub) > 0]

# Compute correlation
decon_cor <- cor(decon_mtrx_sub)

# Compute correlation P-value
p.mat <- corrplot::cor.mtest(mat = decon_mtrx_sub, conf.level = 0.95)

# Visualize
ggcorrplot::ggcorrplot(
  corr = decon_cor,
  p.mat = p.mat[[1]],
  hc.order = TRUE,
  type = "full",
  insig = "blank",
  lab = TRUE,
  outline.col = "lightgrey",
  method = "square",
  # colors = c("#4477AA", "white", "#BB4444"))
  colors = c("#6D9EC1", "white", "#E46726"),
  title = "Predicted cell-cell proportion correlation",
  legend.title = "Correlation\n(Pearson)") +
  ggplot2::theme(
    plot.title = ggplot2::element_text(size = 22, hjust = 0.5, face = "bold"),
    legend.text = ggplot2::element_text(size = 12),
    legend.title = ggplot2::element_text(size = 15),
    axis.text.x = ggplot2::element_text(angle = 90),
    axis.text = ggplot2::element_text(size = 18, vjust = 0.5))
image.png

0.8 Step-by-Step insight

Here we are going to show step by step what is going on and all the different steps involved in the process.

image.png

還是那句話诚撵,生活很好占锯,有你更好

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
禁止轉(zhuǎn)載,如需轉(zhuǎn)載請(qǐng)通過(guò)簡(jiǎn)信或評(píng)論聯(lián)系作者。
  • 序言:七十年代末内边,一起剝皮案震驚了整個(gè)濱河市歼跟,隨后出現(xiàn)的幾起案子呕童,更是在濱河造成了極大的恐慌砚婆,老刑警劉巖,帶你破解...
    沈念sama閱讀 206,013評(píng)論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件志于,死亡現(xiàn)場(chǎng)離奇詭異涮因,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī)伺绽,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評(píng)論 2 382
  • 文/潘曉璐 我一進(jìn)店門养泡,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)嗜湃,“玉大人,你說(shuō)我怎么就攤上這事瓤荔【辉椋” “怎么了?”我有些...
    開(kāi)封第一講書(shū)人閱讀 152,370評(píng)論 0 342
  • 文/不壞的土叔 我叫張陵输硝,是天一觀的道長(zhǎng)今瀑。 經(jīng)常有香客問(wèn)我,道長(zhǎng)点把,這世上最難降的妖魔是什么橘荠? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 55,168評(píng)論 1 278
  • 正文 為了忘掉前任,我火速辦了婚禮郎逃,結(jié)果婚禮上哥童,老公的妹妹穿的比我還像新娘。我一直安慰自己褒翰,他們只是感情好贮懈,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,153評(píng)論 5 371
  • 文/花漫 我一把揭開(kāi)白布。 她就那樣靜靜地躺著优训,像睡著了一般朵你。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上揣非,一...
    開(kāi)封第一講書(shū)人閱讀 48,954評(píng)論 1 283
  • 那天抡医,我揣著相機(jī)與錄音,去河邊找鬼早敬。 笑死忌傻,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的搞监。 我是一名探鬼主播水孩,決...
    沈念sama閱讀 38,271評(píng)論 3 399
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼琐驴!你這毒婦竟也來(lái)了俘种?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 36,916評(píng)論 0 259
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤棍矛,失蹤者是張志新(化名)和其女友劉穎安疗,沒(méi)想到半個(gè)月后抛杨,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體够委,經(jīng)...
    沈念sama閱讀 43,382評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,877評(píng)論 2 323
  • 正文 我和宋清朗相戀三年怖现,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了茁帽。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片玉罐。...
    茶點(diǎn)故事閱讀 37,989評(píng)論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖潘拨,靈堂內(nèi)的尸體忽然破棺而出吊输,到底是詐尸還是另有隱情,我是刑警寧澤铁追,帶...
    沈念sama閱讀 33,624評(píng)論 4 322
  • 正文 年R本政府宣布季蚂,位于F島的核電站,受9級(jí)特大地震影響琅束,放射性物質(zhì)發(fā)生泄漏扭屁。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,209評(píng)論 3 307
  • 文/蒙蒙 一涩禀、第九天 我趴在偏房一處隱蔽的房頂上張望料滥。 院中可真熱鬧,春花似錦艾船、人聲如沸葵腹。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 30,199評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)践宴。三九已至,卻和暖如春雁社,著一層夾襖步出監(jiān)牢的瞬間浴井,已是汗流浹背。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 31,418評(píng)論 1 260
  • 我被黑心中介騙來(lái)泰國(guó)打工霉撵, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留磺浙,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 45,401評(píng)論 2 352
  • 正文 我出身青樓徒坡,卻偏偏與公主長(zhǎng)得像撕氧,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子喇完,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,700評(píng)論 2 345

推薦閱讀更多精彩內(nèi)容