空間轉(zhuǎn)錄組+單細(xì)胞聯(lián)合分析---(SPOTlight最新版教程)
作者在2022年更新了SPOTlight腳本代碼结耀,或許有伙伴在運(yùn)行以往的版本教程(https://marcelosua.github.io/)spotlight_deconvolution函數(shù)的時(shí)候會(huì)出現(xiàn)報(bào)錯(cuò):could not find function "spotlight_deconvolution"颗品,這就說明版本更新了,以往的函數(shù)代碼作者也已經(jīng)更新献联。
什么是SPOTlight?
由于10x Visium數(shù)據(jù)初衷是想將每個(gè)spot看作一個(gè)細(xì)胞,但是現(xiàn)實(shí)實(shí)驗(yàn)中單個(gè)spot中包含不僅是一個(gè)細(xì)胞璧亮。如何確定每個(gè)spot中包含的細(xì)胞景馁,有利于空間轉(zhuǎn)錄組的分析板壮。
SPOTlight 是一個(gè)工具,能夠解讀每個(gè)捕獲位置細(xì)胞混合物內(nèi)存在的細(xì)胞類型和比例合住,最初是為10X的Visium-空間轉(zhuǎn)錄組學(xué)技術(shù)而開發(fā)绰精。
SPOTlight可以結(jié)合單細(xì)胞轉(zhuǎn)錄組RNA測(cè)序信息反卷積deconvolute空間數(shù)據(jù),識(shí)別每個(gè)spot中的細(xì)胞類型和比例透葛。SPOTlight利用這兩種數(shù)據(jù)類型的優(yōu)勢(shì)笨使,能夠?qū)T與scRNA-seq數(shù)據(jù)集成,從而推斷出復(fù)雜組織中細(xì)胞類型和狀態(tài)的位置僚害。SPOTlight基于一個(gè)種子的非負(fù)矩陣因子分解回歸(Seeded NMF regression )硫椰,使用細(xì)胞類型標(biāo)記基因和非負(fù)最小二乘(NNLS)初始化,隨后去卷積ST捕獲位置(spot)。作者文章:Elosua-Bayes M, Nieto P, Mereu E, Gut I, Heyn H (2021): SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res 49(9):e50. doi:10.1093/nar/gkab043.
研究流程圖如下:
研究代碼如下:
首先安裝SPOTlight包
install.packages("BiocManager")
BiocManager::install("SPOTlight")
# Or the devel version
BiocManager::install("SPOTlight", version = "devel")
#或者從Github中安裝
install.packages("devtools")
library(devtools)
install_github("https://github.com/MarcElosua/SPOTlight")
加載包開始分析
library(ggplot2)
library(SPOTlight)
library(SingleCellExperiment)
library(SpatialExperiment)
library(scater)
library(scran)
library(scuttle)
library(Matrix)
library(data.table)
library(Seurat)
library(SeuratData)
library(dplyr)
library(gt)
library(igraph)
library(RColorBrewer)
## Loading the spatial and single cell data:
spe <- readRDS("path:/scRNA")
sce <- readRDS("path:/scRNA.rds") ##單細(xì)胞數(shù)據(jù)經(jīng)scater處理的,對(duì)接logNormCounts
##代碼空轉(zhuǎn)和單細(xì)胞演示數(shù)據(jù)是作者從Biocondcutor packages下載
library(TENxVisiumData) ##空轉(zhuǎn)數(shù)據(jù)
spe <- MouseKidneyCoronal()
# Use symbols instead of Ensembl IDs as feature names
rownames(spe) <- rowData(spe)$symbol
##單細(xì)胞數(shù)據(jù)集來自:[Tabula Muris Sensis](https://www.nature.com/articles/s41586-020-2496-1)
library(TabulaMurisSenisData) ##單細(xì)胞數(shù)據(jù)
sce <- TabulaMurisSenisDroplet(tissues = "Kidney")$Kidney
##快速查看數(shù)據(jù):
table(sce$free_annotation, sce$age)
# 作者為了去除批次影響挑選特定年齡段的小鼠細(xì)胞
sce <- sce[, sce$age == "18m"]
# 挑選具有明確注釋的細(xì)胞類別,這個(gè)可以根據(jù)自身情況調(diào)整
sce <- sce[, !sce$free_annotation %in% c("nan", "CD45")]
## Preprocessing
### Feature selection(第一步獲得每種細(xì)胞類型的標(biāo)記基因,歸一化處理)
sce <- logNormCounts(sce)
##獲得既不是核糖體也不是線粒體的基因向量
genes <- !grepl(pattern = "^Rp[l|s]|Mt", x = rownames(sce))
dec <- modelGeneVar(sce, subset.row = genes)
plot(dec$mean, dec$total, xlab = "Mean log-expression", ylab = "Variance")
curve(metadata(dec)$trend(x), col = "blue", add = TRUE)
# 獲得3000個(gè)的高變基因HVGs.
hvg <- getTopHVGs(dec, n = 3000)
colLabels(sce) <- colData(sce)$celltype
# 計(jì)算標(biāo)記基因靶草,表明該基因?qū)υ摷?xì)胞類型的重要性
mgs <- scoreMarkers(sce, subset.row = genes)
mgs_fil <- lapply(names(mgs), function(i) {
x <- mgs[[i]]
# 篩選并保留相關(guān)標(biāo)記基因蹄胰,即AUC>0.8的基因
x <- x[x$mean.AUC > 0.8, ]
# 將基因從高到低的權(quán)重排序
x <- x[order(x$mean.AUC, decreasing = TRUE), ]
# 將基因和聚類ID添加到數(shù)據(jù)框中
x$gene <- rownames(x)
x$cluster <- i
data.frame(x)
})
mgs_df <- do.call(rbind, mgs_fil)
##Cell Downsampling(每個(gè)細(xì)胞類別最多隨機(jī)選擇100個(gè)細(xì)胞,<100個(gè)細(xì)胞都將被使用。轉(zhuǎn)錄譜明顯不同的(如B和T),可使用較少N的細(xì)胞;轉(zhuǎn)錄譜相似的細(xì)胞增加N)
# 按細(xì)胞類別分割細(xì)胞索引
idx <- split(seq(ncol(sce)), sce$celltype)
# 每個(gè)類別和亞群最多降樣到20個(gè)
# 我們?cè)谶@里使用5個(gè)來加快進(jìn)程奕翔,但對(duì)于你的實(shí)際情況烤送,可以設(shè)置為75-100個(gè)。
# 生活模式分析
n_cells <- 5
cs_keep <- lapply(idx, function(i) {
n <- length(i)
if (n < n_cells)
n_cells <- n
sample(i, n_cells)
})
sce <- sce[, unlist(cs_keep)]
## Deconvolution(運(yùn)行 "SPOTlight 函數(shù)"來解構(gòu)spot)
res <- SPOTlight(
x = sce,
y = spe,
groups = as.character(sce$free_annotation),
mgs = mgs_df,
hvg = hvg,
weight_id = "mean.AUC",
group_id = "cluster",
gene_id = "gene")
##或者分兩步運(yùn)行`SPOTlight'得到訓(xùn)練好的模型,在其他數(shù)據(jù)集上可重復(fù)使用糠悯。
mod_ls <- trainNMF(
x = sce,
y = spe,
groups = sce$type,
mgs = mgs,
weight_id = "weight",
group_id = "type",
gene_id = "gene")
# Run deconvolution(運(yùn)行去卷積)
res <- runDeconvolution(
x = spe,
mod = mod_ls[["mod"]],
ref = mod_ls[["topic"]])
##從`SPOTlight`中提取數(shù)據(jù)
# 提取去卷積矩陣
head(mat <- res$mat)[, seq_len(3)]
# 提取NMF模型擬合
mod <- res$NMF
## 可視化過程
## Topic profiles(設(shè)置`facet = FALSE`,topic profile表征細(xì)胞類別,每個(gè)細(xì)胞類別都有一個(gè)獨(dú)特的主題特征)
plotTopicProfiles(
x = mod,
y = sce$free_annotation,
facet = FALSE,
min_prop = 0.01,
ncol = 1) +
theme(aspect.ratio = 1)
##確保所有來自同一細(xì)胞類別的細(xì)胞都有類似的topic profile
plotTopicProfiles(
x = mod,
y = sce$free_annotation,
facet = TRUE,
min_prop = 0.01,
ncol = 6)
##查看模型為每個(gè)主題學(xué)習(xí)了哪些基因,更高的數(shù)值表明該基因與該主題更相關(guān)
library(NMF)
sign <- basis(mod)
colnames(sign) <- paste0("Topic", seq_len(ncol(sign)))
head(sign)
# 這可以用DT動(dòng)態(tài)可視化帮坚,如下所示
# DT::datatable(sign, fillContainer = TRUE, filter = "top")
## Spatial Correlation Matrix(空間相關(guān)矩陣)
#參照(http://www.sthda.com/english/wiki/ggcorrplot-visualization-of-a-correlation-matrix-using-ggplot2)獲得參數(shù)
plotCorrelationMatrix(mat)
## Co-localization(空間共定位及相互作用)
#參照(https://www.r-graph-gallery.com/network.html) 獲取參數(shù).
plotInteractions(mat, which = "heatmap", metric = "prop")
plotInteractions(mat, which = "heatmap", metric = "jaccard")
plotInteractions(mat, which = "network")
## Scatterpie(將細(xì)胞類型的比例可視化為餅狀圖)
ct <- colnames(mat)
mat[mat < 0.1] <- 0
# 定義調(diào)色板
# 使用'colorBlindness'軟件包中的'paletteMartin'
paletteMartin <- c(
"#000000", "#004949", "#009292", "#ff6db6", "#ffb6db",
"#490092", "#006ddb", "#b66dff", "#6db6ff", "#b6dbff",
"#920000", "#924900", "#db6d00", "#24ff24", "#ffff6d")
pal <- colorRampPalette(paletteMartin)(length(ct))
names(pal) <- ct
plotSpatialScatterpie(
x = spe,
y = mat,
cell_types = colnames(mat),
img = FALSE,
scatterpie_alpha = 1,
pie_scale = 0.4) +
scale_fill_manual(
values = pal,
breaks = names(pal))
##在圖像下方--將其逆時(shí)針旋轉(zhuǎn)90度,并在橫軸上做鏡像,以顯示如果spot不在圖像上覆蓋該如何對(duì)齊
plotSpatialScatterpie(
x = spe,
y = mat,
cell_types = colnames(mat),
img = FALSE,
scatterpie_alpha = 1,
pie_scale = 0.4,
# 將圖像逆時(shí)針旋轉(zhuǎn)90度
degrees = -90,
# 將圖像在其X軸上進(jìn)行旋轉(zhuǎn)
axis = "h") +
scale_fill_manual(
values = pal,
breaks = names(pal))
## Residuals
##最后觀察模型對(duì)每個(gè)點(diǎn)的比例的預(yù)測(cè)情況,通過觀察每個(gè)斑點(diǎn)的平方之和的殘差來實(shí)現(xiàn),這表明模型不能解釋的生物信號(hào)的數(shù)量
spe$res_ss <- res[[2]][colnames(spe)]
xy <- spatialCoords(spe)
spe$x <- xy[, 1]
spe$y <- xy[, 2]
ggcells(spe, aes(x, y, color = res_ss)) +
geom_point() +
scale_color_viridis_c() +
coord_fixed() +
theme_bw()
# Session information(查看目前相關(guān)包的版本信息)
sessionInfo()