本教程使用Seurat包進(jìn)行10x Visium單細(xì)胞空間轉(zhuǎn)錄組數(shù)據(jù)分析橘霎。
這個(gè)教程涉及:
標(biāo)準(zhǔn)化
降維和聚類
檢測(cè)空間差異表達(dá)基因
交互可視化
與單細(xì)胞轉(zhuǎn)錄組整合分析
整合切片信息
#1. R環(huán)境
## 檢查Seurat版本
本教程:Seurat (>=3.2)
help(Seurat)
## 安裝包:
# Enter commands in R (or R studio, if installed)
install.packages('Seurat')
- SeuratData是數(shù)據(jù)分發(fā)Seurat對(duì)象的包牌捷。使用它可以直接訪問Seurat 教程中使用的數(shù)據(jù)集亚脆。
devtools:: install_github ('satijalab/seurat-data')
- patchwork 用于組合圖
install.packages('patchwork')
## 導(dǎo)入包
library(Seurat)
library(SeuratData)
library(ggplot2)
library(patchwork)
library(dplyr)
# 數(shù)據(jù)
本教程使用的空間轉(zhuǎn)錄組數(shù)據(jù)來由Visium v1產(chǎn)生 禀忆;細(xì)胞來自于老鼠兩個(gè)腦前部切片和對(duì)應(yīng)的兩個(gè)腦后部切片朱巨。
數(shù)據(jù)來自于10X官網(wǎng)呀潭,見10X Datasets; 在本教程中可以使用 Load10X_Spatial()
函數(shù)直接導(dǎo)入械蹋,返回Seurat 對(duì)象俏竞,其中reads數(shù)據(jù)由 spaceranger分析流程產(chǎn)生绸硕,包含spot水平的表達(dá)數(shù)據(jù)以及對(duì)應(yīng)的組織切片的圖像數(shù)據(jù)。
InstallData("stxBrain")
brain <- LoadData("stxBrain", type = "anterior1")
# 數(shù)據(jù)預(yù)處理
## 數(shù)據(jù)標(biāo)準(zhǔn)化
plot1 <- VlnPlot(brain, features = "nCount_Spatial", pt.size = 0.1) + NoLegend()
plot2 <- SpatialFeaturePlot(brain, features = "nCount_Spatial") + theme(legend.position = "right")
wrap_plots(plot1, plot2)
結(jié)果表明不同spots之間的變化不僅受技術(shù)的影響魂毁,也收到了組織切片的影響玻佩。在缺少神經(jīng)元位置(such as the cortical white matter)單細(xì)胞測(cè)序得到的數(shù)值比較小席楚;當(dāng)使用LogNormalize()時(shí)咬崔,會(huì)首先將這些spots size標(biāo)準(zhǔn)化到同一水平,這種處理可能存在問題。
在此垮斯,作者建議使用sctransform (Hafemeister and Satija, Genome Biology 2019)。sctransform 通過建立基因表達(dá)的regularized negative binomial models來解釋技術(shù)誤差同時(shí)保留生物學(xué)誤差兜蠕。
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)
## 比較sctransform 和log-normalization 的結(jié)果扰肌;
# rerun normalization to store sctransform residuals for all genes
brain <- SCTransform(brain, assay = "Spatial", return.only.var.genes = FALSE, verbose = FALSE)
# also run standard log normalization for comparison
brain <- NormalizeData(brain, verbose = FALSE, assay = "Spatial")
# Computes the correlation of the log normalized data and sctransform residuals with the number of UMIs
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "Spatial", slot = "data", do.plot = FALSE)
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "SCT", slot = "scale.data", do.plot = FALSE)
p1 <- GroupCorrelationPlot(brain, assay = "Spatial", cor = "nCount_Spatial_cor") + ggtitle("Log Normalization") +
theme(plot.title = element_text(hjust = 0.5))
p2 <- GroupCorrelationPlot(brain, assay = "SCT", cor = "nCount_Spatial_cor") + ggtitle("SCTransform Normalization") +
theme(plot.title = element_text(hjust = 0.5))
p1 + p2
在這兒,計(jì)算了每個(gè)基因UMI值和標(biāo)準(zhǔn)化之后結(jié)果的相關(guān)性熊杨。并且曙旭,根據(jù)基因表達(dá)的均值將其劃分為6個(gè)組。在log-normalization 的結(jié)果中晶府,前三個(gè)組(基因表達(dá)均值大)存在明顯的偏差桂躏,這也表明技術(shù)偏差還是在log-normalization的結(jié)果中存在。相比之下郊霎,sctransform標(biāo)準(zhǔn)化大大減小了這種影響沼头。
# 基因表達(dá)可視化
Seurat中的SpatialFeaturePlot()函數(shù)可以將分子數(shù)據(jù)疊加到組織組織學(xué)數(shù)據(jù)上進(jìn)行展示。例如书劝,在這組小鼠大腦數(shù)據(jù)中进倍,Hpca基因是一個(gè)強(qiáng)大的海馬標(biāo)記,Ttr是脈絡(luò)叢的標(biāo)記购对。
SpatialFeaturePlot(brain, features = c("Hpca", "Ttr"))
Seurat中的默認(rèn)參數(shù)強(qiáng)調(diào)分子數(shù)據(jù)的可視化猾昆。但是,也可以調(diào)整斑點(diǎn)的大新獍(及其透明度)以改善組織學(xué)圖像的可視化效果:
- pt.size.factor: 縮放斑點(diǎn)的大小垂蜗。默認(rèn)值為1.6
- alpha-最小和最大透明度。默認(rèn)值是c(1解幽,1)
- 嘗試將alpha c(0.1,1)設(shè)置為較低贴见,以降低具有較低表達(dá)式的點(diǎn)的透明度
通過alpha
c(0.1, 1)降低地表達(dá)點(diǎn)的透明度
p1 <- SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1)
p2 <- SpatialFeaturePlot(brain, features = "Ttr", alpha = c(0.1, 1))
p1 + p2
# 降維、聚類和可視化
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)
brain <- FindClusters(brain, verbose = FALSE)
brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
dimPlot()可視化UMAP結(jié)果躲株;SpatialDimPlot()聯(lián)合映像結(jié)果進(jìn)行可視化片部;
p1 <- DimPlot(brain, reduction = "umap", label = TRUE)
p2 <- SpatialDimPlot(brain, label = TRUE, label.size = 3)
p1 + p2
SpatialDimPlot中的cells.highlight可以通過顏色區(qū)分細(xì)胞,并且可以標(biāo)記不同類群細(xì)胞的空間位置霜定。
SpatialDimPlot(brain, cells.highlight = CellsByIdentities(object = brain, idents = c(2, 1, 4, 3,
5, 8)), facet.highlight = TRUE, ncol = 3)
# 交互式可視化
SpatialDimPlot()
和SpatialFeaturePlot()
可以進(jìn)行可視化档悠;只需要設(shè)置interactive為TRUE就可以在Rstudio viewer面板展示結(jié)果。
- SpatialDimPlot
SpatialDimPlot(brain, interactive = TRUE)
- SpatialFeaturePlot
SpatialFeaturePlot(brain, features = "Ttr", interactive = TRUE)
- LinkedDimPlot
- LinkedDimPlot()可以將UMAP 和組織影像的結(jié)果聯(lián)合展示望浩。
LinkedDimPlot(brain)
# 差異表達(dá)基因鑒定
Seurat 提供了兩種流程來鑒定表達(dá)與位置相關(guān)的基因辖所。第一種是基于組織中已知區(qū)域進(jìn)行基因差異表達(dá)分析。
de_markers <- FindMarkers(brain, ident.1 = 5, ident.2 = 6)
SpatialFeaturePlot(object = brain, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), ncol = 3)
另一種方法是FindSpatiallyVariables()函數(shù)在不存在預(yù)注釋的情況下尋找基因空間表達(dá)模式磨德。FindSpatiallyVariables()函數(shù)可以調(diào)用markvariogram方法( Trendsceek)缘回,模擬空間轉(zhuǎn)錄組數(shù)據(jù)為標(biāo)記點(diǎn)從而計(jì)算variogram,variogram可以表征基因表達(dá)水平受其空間位置的影響。具體點(diǎn)就是酥宴,計(jì)算一定距離(r揩环,默認(rèn)為5)兩點(diǎn)之間的依賴性。
brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain)[1:1000],
selection.method = "markvariogram")
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = "markvariogram"), 6)
SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))
還有更多方法可以用于鑒定差異表達(dá)基因幅虑,例如 SpatialDE, 和Splotch丰滑。
可視化鑒定的最顯著
brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain)[1:1000],
selection.method = "markvariogram")
# 根據(jù)切片區(qū)域提取單細(xì)胞數(shù)據(jù)
這兒提取前額皮質(zhì)的數(shù)據(jù)進(jìn)行進(jìn)一步分析;提取數(shù)個(gè)細(xì)胞類群的數(shù)據(jù)倒庵,然后在切片數(shù)據(jù)上進(jìn)行可視化褒墨。
cortex <- subset(brain, idents = c(1, 2, 3, 4, 6, 7))
# now remove additional cells, use SpatialDimPlots to visualize what to remove
# SpatialDimPlot(cortex,cells.highlight = WhichCells(cortex, expression = image_imagerow > 400
# | image_imagecol < 150))
cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)
p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE)
p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3)
p1 + p2
# 單細(xì)胞數(shù)據(jù)整合
10x Visium中每個(gè)spot長寬約50um,因此每個(gè)spot測(cè)序數(shù)據(jù)包含多個(gè)細(xì)胞擎宝。更多的研究者希望反卷積去研究每個(gè)空間點(diǎn)的數(shù)據(jù)從而去預(yù)測(cè)細(xì)胞類型的構(gòu)成郁妈。Seurat作者使用公開的SMART-Seq2數(shù)據(jù)(a reference scRNA-seq dataset of ~14,000 adult mouse cortical cell taxonomy from the Allen Institute)測(cè)試一些列反卷積和整合方法。相比于反卷積绍申,整合的方法能夠取得更好的結(jié)果噩咪。這可能是空間轉(zhuǎn)錄組數(shù)據(jù)和普通單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)的噪音模型存在巨大的差異造成的。在此极阅,使用 Seurat v3中基于anchor的整合方法胃碾,能夠給予參考數(shù)據(jù)對(duì)新的數(shù)據(jù)集進(jìn)行注釋。
數(shù)據(jù)下載:data
allen_reference <- readRDS("./allen_cortex.rds")
# note that setting ncells=3000 normalizes the full dataset but learns noise models on 3k
# cells this speeds up SCTransform dramatically with no loss in performance
library(dplyr)
allen_reference <- SCTransform(allen_reference, ncells = 3000, verbose = FALSE) %>%
RunPCA(verbose = FALSE) %>%
RunUMAP(dims = 1:30)
# After subsetting, we renormalize cortex
cortex <- SCTransform(cortex, assay = "Spatial", verbose = FALSE) %>%
RunPCA(verbose = FALSE)
# the annotation is stored in the 'subclass' column of object metadata
DimPlot(allen_reference, group.by = "subclass", label = TRUE)
anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT")
predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass, prediction.assay = TRUE,
weight.reduction = cortex[["pca"]], dims = 1:30)
cortex[["predictions"]] <- predictions.assay
對(duì)每個(gè)spots打分筋搏;特別是前額葉皮質(zhì)區(qū)是層級(jí)興奮性神經(jīng)元仆百。在這里,可以區(qū)分這些神經(jīng)元亞型的不同順序?qū)樱?/p>
DefaultAssay(cortex) <- "predictions"
SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2, crop = TRUE)
基于預(yù)測(cè)分?jǐn)?shù)也可以預(yù)測(cè)不同細(xì)胞類型在空間上的位置奔脐。使用同樣的方法也可以鑒定空間差異表達(dá)基因俄周,使用細(xì)胞類型預(yù)測(cè)分?jǐn)?shù)打分基因代替基因表達(dá)。
cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "markvariogram",
features = rownames(cortex), r.metric = 5, slot = "data")
top.clusters <- head(SpatiallyVariableFeatures(cortex), 4)
SpatialPlot(object = cortex, features = top.clusters, ncol = 2)
最終髓迎,結(jié)果表明整合方法能夠發(fā)現(xiàn)已知神經(jīng)元或非神經(jīng)元( including laminar excitatory, layer-1 astrocytes, and the cortical grey matter.)的空間定位模式峦朗。
SpatialFeaturePlot(cortex, features = c("Astro", "L2/3 IT", "L4", "L5 PT", "L5 IT", "L6 CT", "L6 IT",
"L6b", "Oligo"), pt.size.factor = 1, ncol = 2, crop = FALSE, alpha = c(0.1, 1))
# 整合多個(gè)切片數(shù)據(jù)
前面處理的數(shù)據(jù)都是anterior,現(xiàn)在讀入對(duì)應(yīng)的posterior數(shù)據(jù),然后進(jìn)行一樣的廚師標(biāo)準(zhǔn)化過程排龄。
brain2 <- LoadData("stxBrain", type = "posterior1")
brain2 <- SCTransform(brain2, assay = "Spatial", verbose = FALSE)
同時(shí)處理多個(gè)切片Seurat 對(duì)象數(shù)據(jù)波势;使用merge函數(shù)整合數(shù)據(jù)。
brain.merge <- merge(brain, brain2)
然后進(jìn)行聯(lián)合降維和聚類
DefaultAssay(brain.merge) <- "SCT"
VariableFeatures(brain.merge) <- c(VariableFeatures(brain), VariableFeatures(brain2))
brain.merge <- RunPCA(brain.merge, verbose = FALSE)
brain.merge <- FindNeighbors(brain.merge, dims = 1:30)
brain.merge <- FindClusters(brain.merge, verbose = FALSE)
brain.merge <- RunUMAP(brain.merge, dims = 1:30)
基于UMAP 展示涣雕,SpatialDimPlot()和SpatialFeaturePlot()艰亮;
DimPlot(brain.merge, reduction = "umap", group.by = c("ident", "orig.ident"))
SpatialDimPlot(brain.merge)
SpatialFeaturePlot(brain.merge, features = c("Hpca", "Plp1"))
# 原文
Analysis, visualization, and integration of spatial datasets with Seurat