歡迎關注同名公主號:BBio
流程包括單切片的基本分析广辰、scRNA整合、多切片整合。
參考:https://satijalab.org/seurat/articles/spatial_vignette.html
//DataSet: 10x Visium
#image
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Mouse_Brain_Sagittal_Anterior/V1_Mouse_Brain_Sagittal_Anterior_spatial.tar.gz
#filtered_feature_bc_matrix.h5
wget https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Mouse_Brain_Sagittal_Anterior/V1_Mouse_Brain_Sagittal_Anterior_filtered_feature_bc_matrix.h5
//Data preprocessing
我們通過基因表達數(shù)據(jù)在現(xiàn)場進行的初始預處理步驟類似于典型的scRNA-seq實驗。我們首先需要對數(shù)據(jù)進行規(guī)范化球化,以考慮數(shù)據(jù)點之間序列深度的差異。我們注意到瓦糟,對于空間數(shù)據(jù)集來說筒愚,分子計數(shù)/點的差異可能很大,尤其是在組織中存在細胞密度差異的情況下菩浙。我們在這里看到了很大的異質性巢掺,這需要有效的標準化扯再。
library(Seurat)
library(SeuratData)
library(ggplot2)
library(patchwork)
library(dplyr)
pic <- function(pic, png, width=480, height=480){
png(png, width=width, height=height)
print(pic)
dev.off()
}
brain <- Load10X_Spatial("data",
filename = "filtered_feature_bc_matrix.h5",
assay = "Spatial",
slice = "slice1",
filter.matrix = TRUE)
plot1 <- VlnPlot(brain, features = "nCount_Spatial", pt.size = 0.1) + NoLegend()
plot2 <- SpatialFeaturePlot(brain, features = "nCount_Spatial") + theme(legend.position = "right")
p <- wrap_plots(plot1, plot2)
pic(p, "nCount_Spatial.png", width=960)
這些圖表明,分子計數(shù)的差異不僅僅是技術性質的址遇,而且還取決于組織解剖學熄阻。例如,組織中缺乏神經(jīng)元的區(qū)域(如皮質白質)倔约,可重復地顯示出較低的分子計數(shù)秃殉。因此,標準的方法(如LogNormalize()函數(shù))在標準化后強制每個數(shù)據(jù)點具有相同的底層“大小”浸剩,可能會有問題钾军。
作為一種替代方法,我們建議使用sctransform (Hafemeister和Satija, Genome Biology 2019)绢要,該方法構建正則化負二項基因表達模型吏恭,以便在保留生物方差的同時解釋技術偽像。有關sctransform的更多細節(jié)重罪,請參閱這里的論文和Seurat插圖樱哼。sctransform將數(shù)據(jù)歸一化,檢測高方差特征剿配,并將數(shù)據(jù)存儲在SCT分析中搅幅。
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)
//How do results compare to log-normalization?
為了探討標準化方法的差異,我們研究sctransform和log-normalization的結果如何與UMIs的數(shù)量相關呼胚。為了進行比較茄唐,我們首先重新運行sctransform來存儲所有基因的值,并通過NormalizeData()運行標準化蝇更。
# rerun normalization to store sctransform residuals for all genes
brain <- SCTransform(brain, assay = "Spatial", return.only.var.genes = FALSE, verbose = FALSE)
# also run standard log normalization for comparison
brain <- NormalizeData(brain, verbose = FALSE, assay = "Spatial")
# Computes the correlation of the log normalized data and sctransform residuals with the
# number of UMIs
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "Spatial", slot = "data", do.plot = FALSE)
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "SCT", slot = "scale.data", do.plot = FALSE)
p1 <- GroupCorrelationPlot(brain, assay = "Spatial", cor = "nCount_Spatial_cor") + ggtitle("Log Normalization") +
theme(plot.title = element_text(hjust = 0.5))
p2 <- GroupCorrelationPlot(brain, assay = "SCT", cor = "nCount_Spatial_cor") + ggtitle("SCTransform Normalization") +
theme(plot.title = element_text(hjust = 0.5))
pic(p1+p2, "CorrelationPlot.png", width=960)
對于上面的箱線圖沪编,我們計算每個特征(基因)與UMIs數(shù)量(這里的nCount_Spatial變量)的相關性。然后年扩,我們根據(jù)基因的平均表達將它們分組蚁廓,并生成這些相關性的箱線圖〕K欤可以看到纳令, log-normalization未能充分地將前三組中的基因歸一化挽荠,這表明技術因素繼續(xù)影響高表達基因的歸一化表達估計克胳。相反,sctransform的規(guī)范化大大減輕了這種影響圈匆。
//Gene expression visualization
例如漠另,在這組小鼠大腦數(shù)據(jù)中,Hpca基因是一個強大的海馬體標記跃赚,而Ttr是脈絡膜叢的標記笆搓。
p <- SpatialFeaturePlot(brain, features = c("Hpca", "Ttr"))
pic(p, "FeaturePlot_Hpca_Ttr.png", width=960)
p1 <- SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1)
p2 <- SpatialFeaturePlot(brain, features = "Ttr", alpha = c(0.1, 1))
pic(p1+p2, "FeaturePlot_Ttr.png", width=960)
//Dimensionality reduction, clustering, and visualization
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)
brain <- FindClusters(brain, verbose = FALSE)
brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
p1 <- DimPlot(brain, reduction = "umap", label = TRUE)
p2 <- SpatialDimPlot(brain, label = TRUE, label.size = 3)
pic(p1+p2, "cluster.png", width=960)
p <- SpatialDimPlot(brain, cells.highlight = CellsByIdentities(object = brain, idents = c(2, 1, 4, 3,5, 8)), facet.highlight = TRUE, ncol = 3)
pic(p, "cluster_highlight.png", width=960)
//Identification of Spatially Variable Features
Seurat提供了兩種工作流程來識別與組織內空間位置相關的分子特征性湿。第一種是基于組織內預先標注的解剖區(qū)域進行差異表達,該區(qū)域可以由無監(jiān)督聚類或先驗知識確定满败。這種策略在這種情況下是有效的肤频,因為上面的集群表現(xiàn)出明顯的空間限制。
de_markers <- FindMarkers(brain, ident.1 = 5, ident.2 = 6)
p <- SpatialFeaturePlot(object = brain, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), ncol = 3)
pic(p, "FindMarkers.png", width=480*3)
在FindSpatiallyVariables()中實現(xiàn)的另一種方法是搜索在沒有預注釋的情況下顯示空間模式的特性算墨。默認的方法(method = 'markvariogram)是受Trendsceek的啟發(fā)宵荒,Trendsceek將空間轉錄組數(shù)據(jù)建模為一個標記點過程,并計算一個' variogram '净嘀,該方法可以識別出其表達水平依賴于其空間位置的基因报咳。更具體地說,這個過程計算伽馬(r)值挖藏,測量兩個相距一定“r”距離的點之間的依賴性暑刃。默認情況下,我們在這些分析中使用' 5 '的r值,并且只計算可變基因的這些值(其中的變異是獨立于空間位置計算的)疮胖,以節(jié)省時間伐蒋。
我們注意到,在文獻中有多種方法來完成這項任務婿脸,包括SpatialDE和Splotch。我們鼓勵感興趣的用戶探索這些方法柄驻,并希望在不久的將來添加對它們的支持狐树。
brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain)[1:1000], selection.method = "markvariogram")
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = "markvariogram"), 6)
p <- SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))
pic(p, "FindSpatiallyVariableFeatures.png", width=480*1.5)
//Subset out anatomical regions
與單格對象一樣,您可以對對象進行子集化鸿脓,以關注數(shù)據(jù)的子集抑钟。這里,我們大致劃分了額葉皮層野哭。這一過程也有助于下一節(jié)中這些數(shù)據(jù)與皮質scRNA-seq數(shù)據(jù)集的集成在塔。首先,我們取集群的一個子集拨黔,然后根據(jù)確切的位置進一步細分蛔溃。亞集后,我們可以在完整圖像或裁剪圖像上看到皮質細胞篱蝇。
cortex <- subset(brain, idents = c(1, 2, 3, 4, 6, 7))
# now remove additional cells, use SpatialDimPlots to visualize what to remove
# SpatialDimPlot(cortex,cells.highlight = WhichCells(cortex, expression = image_imagerow > 400
# | image_imagecol < 150))
cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)
p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE)
p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3)
pic(p1+p2, "cortex.png", width=480*2)
//Integration with single-cell data
在~50um時贺待,來自visium實驗的斑點將包含多個細胞的表達譜。對于可獲得scRNA-seq數(shù)據(jù)的越來越多的系統(tǒng)零截,用戶可能有興趣對每個空間體素進行“解卷積”麸塞,以預測細胞類型的底層組成。在準備這篇插圖時涧衙,我們使用艾倫研究所使用SMART-Seq2協(xié)議生成的14000個成年小鼠皮質細胞分類的參考scRNA-seq數(shù)據(jù)集哪工,測試了多種解卷積和整合方法奥此。我們始終發(fā)現(xiàn),使用集成方法(與反卷積方法相反)可以獲得更好的性能雁比,這可能是因為空間和單細胞數(shù)據(jù)集的噪聲模型存在本質上的差異稚虎,而集成方法是專門設計來應對這些差異的。因此偎捎,我們應用了Seurat v3中引入的基于“錨”的集成工作流祥绞,它允許注釋從引用到查詢集的概率傳輸。因此鸭限,我們遵循這里介紹的標簽轉換工作流蜕径,利用sctransform規(guī)范化,但預計將開發(fā)新的方法來完成這項任務败京。
我們首先加載數(shù)據(jù)(下載在這里)兜喻,預處理scRNA-seq引用,然后執(zhí)行標簽傳輸赡麦。對于每個點朴皆,該過程輸出每個scRNA-seq派生類的概率分類。我們在Seurat對象中添加這些預測作為一種新的分析泛粹。
allen_reference <- readRDS("../data/allen_cortex.rds")
# note that setting ncells=3000 normalizes the full dataset but learns noise models on 3k
# cells this speeds up SCTransform dramatically with no loss in performance
library(dplyr)
allen_reference <- SCTransform(allen_reference, ncells = 3000, verbose = FALSE) %>%
RunPCA(verbose = FALSE) %>%
RunUMAP(dims = 1:30)
# After subsetting, we renormalize cortex
cortex <- SCTransform(cortex, assay = "Spatial", verbose = FALSE) %>%
RunPCA(verbose = FALSE)
# the annotation is stored in the 'subclass' column of object metadata
p <- DimPlot(allen_reference, group.by = "subclass", label = TRUE)
pic(p, "ref.png")
現(xiàn)在我們得到每個班級每個位置的預測分數(shù)遂铡。在額葉皮層區(qū)域特別有趣的是層流興奮神經(jīng)元。在這里晶姊,我們可以區(qū)分這些神經(jīng)元亞型的不同順序層扒接,例如:
anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT")
predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass, prediction.assay = TRUE, weight.reduction = cortex[["pca"]], dims = 1:30)
cortex[["predictions"]] <- predictions.assay
DefaultAssay(cortex) <- "predictions"
p <- SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2, crop = TRUE)
pic(p, "order.png")
基于這些預測分數(shù),我們還可以預測空間位置受限的細胞類型们衙。我們使用基于標記點過程的相同方法來定義空間變量特征钾怔,但使用細胞類型預測分數(shù)作為“標記”,而不是基因表達蒙挑。
cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "markvariogram",
features = rownames(cortex), r.metric = 5, slot = "data")
top.clusters <- head(SpatiallyVariableFeatures(cortex), 4)
p <- SpatialPlot(object = cortex, features = top.clusters, ncol = 2)
pic(p, "order.png")
最后宗侦,我們展示了我們的整合過程能夠恢復已知的神經(jīng)元和非神經(jīng)元亞群的空間定位模式,包括層興奮性忆蚀、第1層星形膠質細胞和皮質灰質矾利。
p <- SpatialFeaturePlot(cortex, features = c("Astro", "L2/3 IT", "L4", "L5 PT", "L5 IT", "L6 CT", "L6 IT",
"L6b", "Oligo"), pt.size.factor = 1, ncol = 2, crop = FALSE, alpha = c(0.1, 1))
//Working with multiple slices in Seurat
這個老鼠大腦的數(shù)據(jù)集包含另一個對應于另一半大腦的切片。這里我們讀入它并執(zhí)行相同的初始化馋袜。
#https://www.10xgenomics.com/resources/datasets/mouse-brain-serial-section-1-sagittal-posterior-1-standard-1-1-0
brain2 <- Load10X_Spatial("posterior1",
filename = "filtered_feature_bc_matrix.h5",
assay = "Spatial",
slice = "slice2",
filter.matrix = TRUE)
brain2 <- SCTransform(brain2, assay = "Spatial", verbose = FALSE)
brain.merge <- merge(brain, brain2)
DefaultAssay(brain.merge) <- "SCT"
VariableFeatures(brain.merge) <- c(VariableFeatures(brain), VariableFeatures(brain2))
brain.merge <- RunPCA(brain.merge, verbose = FALSE)
brain.merge <- FindNeighbors(brain.merge, dims = 1:30)
brain.merge <- FindClusters(brain.merge, verbose = FALSE)
brain.merge <- RunUMAP(brain.merge, dims = 1:30)
p <- DimPlot(brain.merge, reduction = "umap", group.by = c("ident", "orig.ident"))
pic(p, "merge.png", width=480*2)
p <- SpatialDimPlot(brain.merge)
pic(p, "merge1.png", width=480*2)
p <- SpatialFeaturePlot(brain.merge, features = c("Hpca", "Plp1"))
pic(p, "merge2.png", width=480*2)