第七節(jié):空間轉(zhuǎn)錄組分析
在本節(jié)教程中,我們將學(xué)習(xí)空間轉(zhuǎn)錄組的一些分析方法鲫凶。10X Visium平臺(tái)的空間轉(zhuǎn)錄組數(shù)據(jù)在許多方面是類似于scRNA-seq數(shù)據(jù)的,它包含5-20個(gè)細(xì)胞而不是單個(gè)細(xì)胞的UMI計(jì)數(shù),但仍然與scRNA-seq相同萤悴,數(shù)據(jù)非常的稀疏胶滋,同時(shí)還具有相關(guān)組織中空間位置的附加信息板鬓。
在這里,我們將使用小鼠大腦 (Sagittal)的兩個(gè)Visium空間轉(zhuǎn)錄組數(shù)據(jù)集進(jìn)行實(shí)戰(zhàn)練習(xí)究恤。首先俭令,以類似于scRNA-seq數(shù)據(jù)的方式運(yùn)行質(zhì)量控制,然后進(jìn)行QC過(guò)濾部宿,降維抄腔,數(shù)據(jù)整合和細(xì)胞聚類。最后,我們使用來(lái)自小鼠大腦皮層的scRNA-seq數(shù)據(jù)運(yùn)行LabelTransfer
來(lái)預(yù)測(cè)Visium空間轉(zhuǎn)錄組數(shù)據(jù)中的細(xì)胞類型赫蛇。
安裝并加載所需的R包
# 安裝seurat-data包
devtools::install_github("satijalab/seurat-data")
##
checking for file ‘/tmp/RtmpcVJCPY/remotes118eae37e56/satijalab-seurat-data-c633765/DESCRIPTION’ ...
? checking for file ‘/tmp/RtmpcVJCPY/remotes118eae37e56/satijalab-seurat-data-c633765/DESCRIPTION’ (561ms)
##
─ preparing ‘SeuratData’:
##
checking DESCRIPTION meta-information ...
? checking DESCRIPTION meta-information
##
─ checking for LF line-endings in source and make files and shell scripts (385ms)
##
─ checking for empty or unneeded directories
##
─ building ‘SeuratData_0.2.1.tar.gz’
##
# 加載所需的R包
suppressPackageStartupMessages(require(Matrix))
suppressPackageStartupMessages(require(dplyr))
suppressPackageStartupMessages(require(SeuratData))
suppressPackageStartupMessages(require(Seurat))
suppressPackageStartupMessages(require(ggplot2))
suppressPackageStartupMessages(require(patchwork))
suppressPackageStartupMessages(require(dplyr))
加載空間轉(zhuǎn)錄組數(shù)據(jù)集
SeuratData
包中包含了幾種不同的數(shù)據(jù)集绵患,其中有來(lái)自小鼠大腦和腎臟的空間轉(zhuǎn)錄組數(shù)據(jù)。在這里棍掐,我們使用該包中的小鼠大腦數(shù)據(jù)集進(jìn)行后續(xù)的分析藏雏。
outdir = "data/spatial/"
dir.create(outdir, showWarnings = F)
# to list available datasets in SeuratData you can run AvailableData()
# first we dowload the dataset
InstallData("stxBrain")
## Check again that it works, did not work at first...
# now we can list what datasets we have downloaded
InstalledData()
# now we will load the seurat object for one section
brain1 <- LoadData("stxBrain", type = "anterior1")
brain2 <- LoadData("stxBrain", type = "posterior1")
將這兩個(gè)數(shù)據(jù)集合并到同一個(gè)seuart對(duì)象中。
brain <- merge(brain1, brain2)
brain
## An object of class Seurat
## 31053 features across 6049 samples within 1 assay
## Active assay: Spatial (31053 features, 0 variable features)
數(shù)據(jù)質(zhì)量控制
同scRNA-seq數(shù)據(jù)處理一樣作煌,我們使用UMI計(jì)數(shù)(nCount_Spatial)掘殴,檢測(cè)到的基因數(shù)量(nFeature_Spatial)和線粒體含量等統(tǒng)計(jì)指標(biāo)進(jìn)行數(shù)據(jù)質(zhì)量控制。
# 計(jì)算線粒體基因含量
brain <- PercentageFeatureSet(brain, "^mt-", col.name = "percent_mito")
# 計(jì)算血紅蛋白基因含量
brain <- PercentageFeatureSet(brain, "^Hb.*-", col.name = "percent_hb")
# 使用小提琴圖可視化統(tǒng)計(jì)指標(biāo)
VlnPlot(brain, features = c("nCount_Spatial", "nFeature_Spatial", "percent_mito",
"percent_hb"), pt.size = 0.1, ncol = 2) + NoLegend()
我們還可以將以上統(tǒng)計(jì)指標(biāo)繪制到組織切片上粟誓。
SpatialFeaturePlot(brain, features = c("nCount_Spatial", "nFeature_Spatial", "percent_mito",
"percent_hb"))
如圖所示奏寨,counts/features數(shù)量少且線粒體基因含量高的spots主要分布于組織的邊緣,這些區(qū)域很可能是受損的組織鹰服。如果您的組織切片中有tears或皺褶病瞳,可能還會(huì)看到質(zhì)量較低的組織區(qū)域。
但是我們應(yīng)當(dāng)注意悲酷,對(duì)于某些組織類型套菜,表達(dá)的基因數(shù)目和線粒體的比例也可能是生物學(xué)特征,因此請(qǐng)記住您正在處理的組織以及這些特征的含義设易。
數(shù)據(jù)過(guò)濾篩選
在這里逗柴,我們選擇線粒體基因含量少于25%,血紅蛋白基因含量少于20%和至少檢測(cè)到的1000個(gè)基因的spots顿肺。我們必須根據(jù)對(duì)組織的了解戏溺,選擇適合自己數(shù)據(jù)集的過(guò)濾標(biāo)準(zhǔn)。
brain = brain[, brain$nFeature_Spatial > 500 & brain$percent_mito < 25 & brain$percent_hb < 20]
并重新可視化過(guò)濾后的統(tǒng)計(jì)指標(biāo):
SpatialFeaturePlot(brain, features = c("nCount_Spatial", "nFeature_Spatial", "percent_mito"))
查看最高表達(dá)基因
對(duì)于scRNA-seq數(shù)據(jù)屠尊,我們將看看最高表達(dá)的基因是哪些旷祸。
C = brain@assays$Spatial@counts
C@x = C@x/rep.int(colSums(C), diff(C@p))
most_expressed <- order(Matrix::rowSums(C), decreasing = T)[20:1]
boxplot(as.matrix(t(C[most_expressed, ])), cex = 0.1, las = 1, xlab = "% total count per cell",
col = (scales::hue_pal())(20)[20:1], horizontal = TRUE)
如圖所示,線粒體基因是表達(dá)最多的基因之一讼昆,還有l(wèi)ncRNA基因Bc1(腦細(xì)胞質(zhì)RNA 1)和一種血紅蛋白基因托享。
過(guò)濾篩選相關(guān)基因
接下來(lái),我們將刪除Bc1基因浸赫,血紅蛋白基因(血液污染)和線粒體基因闰围。
dim(brain)
## [1] 31053 5789
# Filter Bl1
brain <- brain[!grepl("Bc1", rownames(brain)), ]
# Filter Mitocondrial
brain <- brain[!grepl("^mt-", rownames(brain)), ]
# Filter Hemoglobin gene (optional if that is a problem on your data)
brain <- brain[!grepl("^Hb.*-", rownames(brain)), ]
dim(brain)
## [1] 31031 5789
對(duì)ST數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化處理
對(duì)于ST數(shù)據(jù),Seurat團(tuán)隊(duì)推薦使用SCTransfrom
方法進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化掺炭,該方法會(huì)篩選出高變異的基因,然后對(duì)其進(jìn)行normalization和scaling凭戴。
brain <- SCTransform(brain, assay = "Spatial", verbose = TRUE, method = "poisson")
現(xiàn)在涧狮,我們可以可視化一些基因的表達(dá)。其中,Hpca基因是hippocampal的一個(gè)marker者冤,而Ttr是choroid plexus的marker基因
SpatialFeaturePlot(brain, features = c("Hpca", "Ttr"))
如果想更好的查看這些基因在組織切片上的表達(dá)情況肤视,我們還可以調(diào)整點(diǎn)的大小和透明度等參數(shù)。
SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1, alpha = c(0.1, 1))
降維與聚類
然后涉枫,我們可以使用與scRNA-seq數(shù)據(jù)分析相同的工作流程來(lái)進(jìn)行降維和聚類邢滑。
但是,請(qǐng)確保我們是在SCT
的assay中進(jìn)行降維和聚類的愿汰。
# PCA線性降維
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
# 構(gòu)建SNN圖并進(jìn)行圖聚類
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)
brain <- FindClusters(brain, verbose = FALSE)
# UMAP降維可視化
brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
DimPlot(brain, reduction = "umap", group.by = c("ident", "orig.ident"))
SpatialDimPlot(brain)
我們還可以分別繪制每個(gè)cluster困后。
SpatialDimPlot(brain, cells.highlight = CellsByIdentities(brain), facet.highlight = TRUE,
ncol = 5)
多樣本數(shù)據(jù)集整合
通常,不同樣本的ST數(shù)據(jù)之間會(huì)產(chǎn)生較強(qiáng)的批處理效應(yīng)衬廷,因此最好對(duì)其進(jìn)行數(shù)據(jù)整合和批次效應(yīng)的矯正摇予。
在這里,我們將使用SCT分析進(jìn)行多樣本數(shù)據(jù)集整合吗跋。因此侧戴,我們需要運(yùn)行PrepSCTIntegration
函數(shù),它將計(jì)算兩個(gè)數(shù)據(jù)集中所有基因的sctransform殘差跌宛。
# create a list of the original data that we loaded to start with
st.list = list(anterior1 = brain1, posterior1 = brain2)
# run SCT on both datasets
# 分別對(duì)每個(gè)數(shù)據(jù)集進(jìn)行SCTransform標(biāo)準(zhǔn)化
st.list = lapply(st.list, SCTransform, assay = "Spatial", method = "poisson")
進(jìn)行多樣本數(shù)據(jù)集整合酗宋。
# need to set maxSize for PrepSCTIntegration to work
options(future.globals.maxSize = 2000 * 1024^2) # set allowed size to 2K MiB
# 選擇整合的特征基因
st.features = SelectIntegrationFeatures(st.list, nfeatures = 3000, verbose = FALSE)
st.list <- PrepSCTIntegration(object.list = st.list, anchor.features = st.features,
verbose = FALSE)
# 識(shí)別整合的anchors
int.anchors <- FindIntegrationAnchors(object.list = st.list, normalization.method = "SCT",
verbose = FALSE, anchor.features = st.features)
# 進(jìn)行數(shù)據(jù)整合
brain.integrated <- IntegrateData(anchorset = int.anchors, normalization.method = "SCT",
verbose = FALSE)
rm(int.anchors, st.list)
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 2857164 152.6 4702831 251.2 4702831 251.2
## Vcells 575106704 4387.8 1278197190 9751.9 1065002088 8125.4
然后,像以前一樣進(jìn)行數(shù)據(jù)降維和聚類疆拘。
brain.integrated <- RunPCA(brain.integrated, verbose = FALSE)
brain.integrated <- FindNeighbors(brain.integrated, dims = 1:30)
brain.integrated <- FindClusters(brain.integrated, verbose = FALSE)
brain.integrated <- RunUMAP(brain.integrated, dims = 1:30)
DimPlot(brain.integrated, reduction = "umap", group.by = c("ident", "orig.ident"))
SpatialDimPlot(brain.integrated)
如圖所示蜕猫,你能看出整合后的數(shù)據(jù)和未整合的數(shù)據(jù)之間有什么區(qū)別嗎?你認(rèn)為哪個(gè)分類的效果看起來(lái)最好入问?作為參考丹锹,我們可以與Allen brain altlas中的大腦區(qū)域進(jìn)行比較。
識(shí)別空間可變基因
目前芬失,有兩種計(jì)算方法可以識(shí)別與組織內(nèi)的空間位置相關(guān)的可變基因楣黍。第一種是基于空間上不同的聚類群執(zhí)行差異表達(dá)分析,第二種是在不考慮聚類群或空間注釋的情況下查找具有空間特異性的基因棱烂。
首先租漂,像scRNA-seq數(shù)據(jù)那樣,我們使用FindMarkers
對(duì)不同的聚類簇進(jìn)行差異表達(dá)分析颊糜。
# differential expression between cluster 1 and cluster 6
de_markers <- FindMarkers(brain.integrated, ident.1 = 5, ident.2 = 6)
# plot top markers
SpatialFeaturePlot(object = brain.integrated, features = rownames(de_markers)[1:3],
alpha = c(0.1, 1), ncol = 3)
同時(shí)哩治,我們還可以使用Seurat包中的FindSpatiallyVariables
函數(shù)來(lái)識(shí)別空間可變基因。
brain <- FindSpatiallyVariableFeatures(brain, assay = 'SCT',
features = VariableFeatures(brain)[1:1000],
selection.method = 'markvariogram')
# We would get top features from SpatiallyVariableFeatures
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = 'markvariogram'), 6)
使用scRNA-seq數(shù)據(jù)集進(jìn)行整合分析
我們還可以使用scRNA-seq的數(shù)據(jù)集與空間轉(zhuǎn)錄組的數(shù)據(jù)集進(jìn)行數(shù)據(jù)整合分析衬鱼。這種整合方法非常有用业筏,因?yàn)槲覀兛梢詫cRNA-seq數(shù)據(jù)中的細(xì)胞類型標(biāo)簽轉(zhuǎn)移到Visium空間轉(zhuǎn)錄組數(shù)據(jù)集中。
在這里鸟赫,我們將使用Allen Institute中產(chǎn)生的約14,000個(gè)成年小鼠大腦皮質(zhì)細(xì)胞分類學(xué)的scRNA-seq參考數(shù)據(jù)集進(jìn)行整合分析蒜胖,該數(shù)據(jù)集是通過(guò)SMART-Seq2測(cè)序技術(shù)生成的消别。
首先,我們使用以下命令將seurat數(shù)據(jù)從https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1下載到文件夾中data/spatial/
:
FILE="./data/spatial/allen_cortex.rds"
if [ -e $FILE ]
then
echo "File $FILE is downloaded."
else
echo "Downloading $FILE"
mkdir -p data/spatial
wget -O data/spatial/allen_cortex.rds https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1
fi
Subset ST for cortex
由于我們使用的scRNA-seq數(shù)據(jù)集是由小鼠大腦皮質(zhì)生成的台谢,因此我們需要將visium數(shù)據(jù)集中的cortex數(shù)據(jù)提取出來(lái)進(jìn)行整合分析寻狂。
# subset for the anterior dataset
cortex <- subset(brain.integrated, orig.ident == "anterior1")
# there seems to be an error in the subsetting, so the posterior1 image is not
# removed, do it manually
cortex@images$posterior1 = NULL
# subset for a specific region
cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)
# also subset for FC clusters
cortex <- subset(cortex, idents = c(0, 1, 6, 7, 12))
p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE)
p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3)
p1 + p2
進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化處理。
# After subsetting, we renormalize cortex
cortex <- SCTransform(cortex, assay = "Spatial", verbose = FALSE, method = "poisson") %>%
RunPCA(verbose = FALSE)
Integrated with scRNA-seq
接下來(lái)朋沮,我們使用FindTransferAnchors
和TransferData
函數(shù)將空間轉(zhuǎn)錄組數(shù)據(jù)和scRNA-seq數(shù)據(jù)進(jìn)行整合蛇券。
anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT")
predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass,
prediction.assay = TRUE, weight.reduction = cortex[["pca"]])
cortex[["predictions"]] <- predictions.assay
DefaultAssay(cortex) <- "predictions"
SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2,
crop = TRUE)
基于這些預(yù)測(cè)得分,我們可以預(yù)測(cè)受空間位置限制的特定細(xì)胞類型樊拓。接下來(lái)纠亚,我們還可以基于預(yù)測(cè)出的細(xì)胞類型識(shí)別空間可變基因。
cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "markvariogram",
features = rownames(cortex), r.metric = 5, slot = "data")
top.clusters <- head(SpatiallyVariableFeatures(cortex), 4)
SpatialPlot(object = cortex, features = top.clusters, ncol = 2)
我們還可以在ST數(shù)據(jù)中查看每個(gè)聚類群的預(yù)測(cè)得分骑脱。
VlnPlot(cortex, group.by = "seurat_clusters", features = top.clusters, pt.size = 0,
ncol = 2)
請(qǐng)記住菜枷,這些分?jǐn)?shù)僅是預(yù)測(cè)細(xì)胞類型的得分,并不對(duì)應(yīng)于某種細(xì)胞類型或相似細(xì)胞的比例叁丧。它主要告訴我們某個(gè)位置的基因表達(dá)與特定細(xì)胞類型的基因表達(dá)高度相似/相異啤誊。
如果查看分?jǐn)?shù),我們會(huì)發(fā)現(xiàn)某些spots在細(xì)胞類型上得到了非常明確的預(yù)測(cè)拥娄,而其他spots在任何一種細(xì)胞類型上都沒有很高的得分蚊锹。
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 20.04 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/czarnewski/miniconda3/envs/scRNAseq2021/lib/libopenblasp-r0.3.10.so
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] stxBrain.SeuratData_0.1.1 patchwork_1.1.1
## [3] ggplot2_3.3.3 Seurat_3.2.3
## [5] SeuratData_0.2.1 dplyr_1.0.3
## [7] Matrix_1.3-2 RJSONIO_1.3-1.4
## [9] optparse_1.6.6
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 colorspace_2.0-0 deldir_0.2-3
## [4] ellipsis_0.3.1 ggridges_0.5.3 rprojroot_2.0.2
## [7] fs_1.5.0 spatstat.data_1.7-0 farver_2.0.3
## [10] leiden_0.3.6 listenv_0.8.0 remotes_2.2.0
## [13] getopt_1.20.3 ggrepel_0.9.1 RSpectra_0.16-0
## [16] fansi_0.4.2 codetools_0.2-18 splines_3.6.1
## [19] knitr_1.30 polyclip_1.10-0 pkgload_1.1.0
## [22] jsonlite_1.7.2 ica_1.0-2 cluster_2.1.0
## [25] png_0.1-7 uwot_0.1.10 sctransform_0.3.2
## [28] shiny_1.5.0 compiler_3.6.1 httr_1.4.2
## [31] assertthat_0.2.1 fastmap_1.0.1 lazyeval_0.2.2
## [34] limma_3.42.0 cli_2.2.0 later_1.1.0.1
## [37] formatR_1.7 htmltools_0.5.1 prettyunits_1.1.1
## [40] tools_3.6.1 rsvd_1.0.3 igraph_1.2.6
## [43] gtable_0.3.0 glue_1.4.2 reshape2_1.4.4
## [46] RANN_2.6.1 rappdirs_0.3.1 spatstat_1.64-1
## [49] Rcpp_1.0.6 scattermore_0.7 vctrs_0.3.6
## [52] nlme_3.1-150 lmtest_0.9-38 xfun_0.20
## [55] stringr_1.4.0 globals_0.14.0 ps_1.5.0
## [58] testthat_3.0.1 mime_0.9 miniUI_0.1.1.1
## [61] lifecycle_0.2.0 irlba_2.3.3 devtools_2.3.2
## [64] goftest_1.2-2 future_1.21.0 MASS_7.3-53
## [67] zoo_1.8-8 scales_1.1.1 spatstat.utils_1.20-2
## [70] promises_1.1.1 parallel_3.6.1 RColorBrewer_1.1-2
## [73] yaml_2.2.1 curl_4.3 gridExtra_2.3
## [76] memoise_1.1.0 reticulate_1.18 pbapply_1.4-3
## [79] rpart_4.1-15 stringi_1.5.3 desc_1.2.0
## [82] pkgbuild_1.2.0 rlang_0.4.10 pkgconfig_2.0.3
## [85] matrixStats_0.57.0 evaluate_0.14 lattice_0.20-41
## [88] tensor_1.5 ROCR_1.0-11 purrr_0.3.4
## [91] labeling_0.4.2 htmlwidgets_1.5.3 cowplot_1.1.1
## [94] processx_3.4.5 tidyselect_1.1.0 parallelly_1.23.0
## [97] RcppAnnoy_0.0.18 plyr_1.8.6 magrittr_2.0.1
## [100] R6_2.5.0 generics_0.1.0 DBI_1.1.1
## [103] mgcv_1.8-33 pillar_1.4.7 withr_2.4.0
## [106] fitdistrplus_1.1-3 abind_1.4-5 survival_3.2-7
## [109] tibble_3.0.5 future.apply_1.7.0 crayon_1.3.4
## [112] KernSmooth_2.23-18 plotly_4.9.3 rmarkdown_2.6
## [115] usethis_2.0.0 grid_3.6.1 data.table_1.13.6
## [118] callr_3.5.1 digest_0.6.27 xtable_1.8-4
## [121] tidyr_1.1.2 httpuv_1.5.5 munsell_0.5.0
## [124] viridisLite_0.3.0 sessioninfo_1.1.1
官網(wǎng):https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/seurat/seurat_07_spatial.html
轉(zhuǎn)載來(lái)自:
作者:Davey1220
鏈接:http://www.reibang.com/p/6baa447af9ce
來(lái)源:簡(jiǎn)書
著作權(quán)歸作者所有。商業(yè)轉(zhuǎn)載請(qǐng)聯(lián)系作者獲得授權(quán)稚瘾,非商業(yè)轉(zhuǎn)載請(qǐng)注明出處牡昆。