第七節(jié):空間轉(zhuǎn)錄組分析
在本節(jié)教程中抡医,我們將學(xué)習(xí)空間轉(zhuǎn)錄組的一些分析方法达址。10X Visium平臺的空間轉(zhuǎn)錄組數(shù)據(jù)在許多方面是類似于scRNA-seq數(shù)據(jù)的蔑祟,它包含5-20個(gè)細(xì)胞而不是單個(gè)細(xì)胞的UMI計(jì)數(shù),但仍然與scRNA-seq相同沉唠,數(shù)據(jù)非常的稀疏疆虚,同時(shí)還具有相關(guān)組織中空間位置的附加信息。
在這里满葛,我們將使用小鼠大腦 (Sagittal)的兩個(gè)Visium空間轉(zhuǎn)錄組數(shù)據(jù)集進(jìn)行實(shí)戰(zhàn)練習(xí)径簿。首先,以類似于scRNA-seq數(shù)據(jù)的方式運(yùn)行質(zhì)量控制嘀韧,然后進(jìn)行QC過濾篇亭,降維,數(shù)據(jù)整合和細(xì)胞聚類锄贷。最后译蒂,我們使用來自小鼠大腦皮層的scRNA-seq數(shù)據(jù)運(yùn)行LabelTransfer
來預(yù)測Visium空間轉(zhuǎn)錄組數(shù)據(jù)中的細(xì)胞類型。
安裝并加載所需的R包
# 安裝seurat-data包
devtools::install_github("satijalab/seurat-data")
##
checking for file ‘/tmp/RtmpcVJCPY/remotes118eae37e56/satijalab-seurat-data-c633765/DESCRIPTION’ ...
? checking for file ‘/tmp/RtmpcVJCPY/remotes118eae37e56/satijalab-seurat-data-c633765/DESCRIPTION’ (561ms)
##
─ preparing ‘SeuratData’:
##
checking DESCRIPTION meta-information ...
? checking DESCRIPTION meta-information
##
─ checking for LF line-endings in source and make files and shell scripts (385ms)
##
─ checking for empty or unneeded directories
##
─ building ‘SeuratData_0.2.1.tar.gz’
##
# 加載所需的R包
suppressPackageStartupMessages(require(Matrix))
suppressPackageStartupMessages(require(dplyr))
suppressPackageStartupMessages(require(SeuratData))
suppressPackageStartupMessages(require(Seurat))
suppressPackageStartupMessages(require(ggplot2))
suppressPackageStartupMessages(require(patchwork))
suppressPackageStartupMessages(require(dplyr))
加載空間轉(zhuǎn)錄組數(shù)據(jù)集
SeuratData
包中包含了幾種不同的數(shù)據(jù)集谊却,其中有來自小鼠大腦和腎臟的空間轉(zhuǎn)錄組數(shù)據(jù)蹂随。在這里,我們使用該包中的小鼠大腦數(shù)據(jù)集進(jìn)行后續(xù)的分析因惭。
outdir = "data/spatial/"
dir.create(outdir, showWarnings = F)
# to list available datasets in SeuratData you can run AvailableData()
# first we dowload the dataset
InstallData("stxBrain")
## Check again that it works, did not work at first...
# now we can list what datasets we have downloaded
InstalledData()
# now we will load the seurat object for one section
brain1 <- LoadData("stxBrain", type = "anterior1")
brain2 <- LoadData("stxBrain", type = "posterior1")
將這兩個(gè)數(shù)據(jù)集合并到同一個(gè)seuart對象中。
brain <- merge(brain1, brain2)
brain
## An object of class Seurat
## 31053 features across 6049 samples within 1 assay
## Active assay: Spatial (31053 features, 0 variable features)
數(shù)據(jù)質(zhì)量控制
同scRNA-seq數(shù)據(jù)處理一樣绩衷,我們使用UMI計(jì)數(shù)(nCount_Spatial)蹦魔,檢測到的基因數(shù)量(nFeature_Spatial)和線粒體含量等統(tǒng)計(jì)指標(biāo)進(jìn)行數(shù)據(jù)質(zhì)量控制激率。
# 計(jì)算線粒體基因含量
brain <- PercentageFeatureSet(brain, "^mt-", col.name = "percent_mito")
# 計(jì)算血紅蛋白基因含量
brain <- PercentageFeatureSet(brain, "^Hb.*-", col.name = "percent_hb")
# 使用小提琴圖可視化統(tǒng)計(jì)指標(biāo)
VlnPlot(brain, features = c("nCount_Spatial", "nFeature_Spatial", "percent_mito",
"percent_hb"), pt.size = 0.1, ncol = 2) + NoLegend()
我們還可以將以上統(tǒng)計(jì)指標(biāo)繪制到組織切片上。
SpatialFeaturePlot(brain, features = c("nCount_Spatial", "nFeature_Spatial", "percent_mito",
"percent_hb"))
如圖所示勿决,counts/features數(shù)量少且線粒體基因含量高的spots主要分布于組織的邊緣乒躺,這些區(qū)域很可能是受損的組織。如果您的組織切片中有tears或皺褶低缩,可能還會看到質(zhì)量較低的組織區(qū)域嘉冒。
但是我們應(yīng)當(dāng)注意,對于某些組織類型咆繁,表達(dá)的基因數(shù)目和線粒體的比例也可能是生物學(xué)特征讳推,因此請記住您正在處理的組織以及這些特征的含義。
數(shù)據(jù)過濾篩選
在這里玩般,我們選擇線粒體基因含量少于25%银觅,血紅蛋白基因含量少于20%和至少檢測到的1000個(gè)基因的spots。我們必須根據(jù)對組織的了解坏为,選擇適合自己數(shù)據(jù)集的過濾標(biāo)準(zhǔn)究驴。
brain = brain[, brain$nFeature_Spatial > 500 & brain$percent_mito < 25 & brain$percent_hb < 20]
并重新可視化過濾后的統(tǒng)計(jì)指標(biāo):
SpatialFeaturePlot(brain, features = c("nCount_Spatial", "nFeature_Spatial", "percent_mito"))
查看最高表達(dá)基因
對于scRNA-seq數(shù)據(jù),我們將看看最高表達(dá)的基因是哪些匀伏。
C = brain@assays$Spatial@counts
C@x = C@x/rep.int(colSums(C), diff(C@p))
most_expressed <- order(Matrix::rowSums(C), decreasing = T)[20:1]
boxplot(as.matrix(t(C[most_expressed, ])), cex = 0.1, las = 1, xlab = "% total count per cell",
col = (scales::hue_pal())(20)[20:1], horizontal = TRUE)
如圖所示洒忧,線粒體基因是表達(dá)最多的基因之一,還有l(wèi)ncRNA基因Bc1(腦細(xì)胞質(zhì)RNA 1)和一種血紅蛋白基因够颠。
過濾篩選相關(guān)基因
接下來熙侍,我們將刪除Bc1基因,血紅蛋白基因(血液污染)和線粒體基因摧找。
dim(brain)
## [1] 31053 5789
# Filter Bl1
brain <- brain[!grepl("Bc1", rownames(brain)), ]
# Filter Mitocondrial
brain <- brain[!grepl("^mt-", rownames(brain)), ]
# Filter Hemoglobin gene (optional if that is a problem on your data)
brain <- brain[!grepl("^Hb.*-", rownames(brain)), ]
dim(brain)
## [1] 31031 5789
對ST數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化處理
對于ST數(shù)據(jù)核行,Seurat團(tuán)隊(duì)推薦使用SCTransfrom
方法進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化,該方法會篩選出高變異的基因蹬耘,然后對其進(jìn)行normalization和scaling芝雪。
brain <- SCTransform(brain, assay = "Spatial", verbose = TRUE, method = "poisson")
現(xiàn)在,我們可以可視化一些基因的表達(dá)综苔。其中惩系,Hpca基因是hippocampal的一個(gè)marker,而Ttr是choroid plexus的marker基因
SpatialFeaturePlot(brain, features = c("Hpca", "Ttr"))
如果想更好的查看這些基因在組織切片上的表達(dá)情況如筛,我們還可以調(diào)整點(diǎn)的大小和透明度等參數(shù)堡牡。
SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1, alpha = c(0.1, 1))
降維與聚類
然后,我們可以使用與scRNA-seq數(shù)據(jù)分析相同的工作流程來進(jìn)行降維和聚類杨刨。
但是晤柄,請確保我們是在SCT
的assay中進(jìn)行降維和聚類的。
# PCA線性降維
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
# 構(gòu)建SNN圖并進(jìn)行圖聚類
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)
brain <- FindClusters(brain, verbose = FALSE)
# UMAP降維可視化
brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
DimPlot(brain, reduction = "umap", group.by = c("ident", "orig.ident"))
SpatialDimPlot(brain)
我們還可以分別繪制每個(gè)cluster妖胀。
SpatialDimPlot(brain, cells.highlight = CellsByIdentities(brain), facet.highlight = TRUE,
ncol = 5)
多樣本數(shù)據(jù)集整合
通常芥颈,不同樣本的ST數(shù)據(jù)之間會產(chǎn)生較強(qiáng)的批處理效應(yīng)惠勒,因此最好對其進(jìn)行數(shù)據(jù)整合和批次效應(yīng)的矯正。
在這里爬坑,我們將使用SCT分析進(jìn)行多樣本數(shù)據(jù)集整合纠屋。因此,我們需要運(yùn)行PrepSCTIntegration
函數(shù)盾计,它將計(jì)算兩個(gè)數(shù)據(jù)集中所有基因的sctransform殘差售担。
# create a list of the original data that we loaded to start with
st.list = list(anterior1 = brain1, posterior1 = brain2)
# run SCT on both datasets
# 分別對每個(gè)數(shù)據(jù)集進(jìn)行SCTransform標(biāo)準(zhǔn)化
st.list = lapply(st.list, SCTransform, assay = "Spatial", method = "poisson")
進(jìn)行多樣本數(shù)據(jù)集整合。
# need to set maxSize for PrepSCTIntegration to work
options(future.globals.maxSize = 2000 * 1024^2) # set allowed size to 2K MiB
# 選擇整合的特征基因
st.features = SelectIntegrationFeatures(st.list, nfeatures = 3000, verbose = FALSE)
st.list <- PrepSCTIntegration(object.list = st.list, anchor.features = st.features,
verbose = FALSE)
# 識別整合的anchors
int.anchors <- FindIntegrationAnchors(object.list = st.list, normalization.method = "SCT",
verbose = FALSE, anchor.features = st.features)
# 進(jìn)行數(shù)據(jù)整合
brain.integrated <- IntegrateData(anchorset = int.anchors, normalization.method = "SCT",
verbose = FALSE)
rm(int.anchors, st.list)
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 2857164 152.6 4702831 251.2 4702831 251.2
## Vcells 575106704 4387.8 1278197190 9751.9 1065002088 8125.4
然后署辉,像以前一樣進(jìn)行數(shù)據(jù)降維和聚類族铆。
brain.integrated <- RunPCA(brain.integrated, verbose = FALSE)
brain.integrated <- FindNeighbors(brain.integrated, dims = 1:30)
brain.integrated <- FindClusters(brain.integrated, verbose = FALSE)
brain.integrated <- RunUMAP(brain.integrated, dims = 1:30)
DimPlot(brain.integrated, reduction = "umap", group.by = c("ident", "orig.ident"))
SpatialDimPlot(brain.integrated)
如圖所示,你能看出整合后的數(shù)據(jù)和未整合的數(shù)據(jù)之間有什么區(qū)別嗎涨薪?你認(rèn)為哪個(gè)分類的效果看起來最好骑素?作為參考,我們可以與Allen brain altlas中的大腦區(qū)域進(jìn)行比較刚夺。
識別空間可變基因
目前献丑,有兩種計(jì)算方法可以識別與組織內(nèi)的空間位置相關(guān)的可變基因。第一種是基于空間上不同的聚類群執(zhí)行差異表達(dá)分析侠姑,第二種是在不考慮聚類群或空間注釋的情況下查找具有空間特異性的基因创橄。
首先,像scRNA-seq數(shù)據(jù)那樣莽红,我們使用FindMarkers
對不同的聚類簇進(jìn)行差異表達(dá)分析妥畏。
# differential expression between cluster 1 and cluster 6
de_markers <- FindMarkers(brain.integrated, ident.1 = 5, ident.2 = 6)
# plot top markers
SpatialFeaturePlot(object = brain.integrated, features = rownames(de_markers)[1:3],
alpha = c(0.1, 1), ncol = 3)
同時(shí),我們還可以使用Seurat包中的FindSpatiallyVariables
函數(shù)來識別空間可變基因安吁。
brain <- FindSpatiallyVariableFeatures(brain, assay = 'SCT',
features = VariableFeatures(brain)[1:1000],
selection.method = 'markvariogram')
# We would get top features from SpatiallyVariableFeatures
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = 'markvariogram'), 6)
使用scRNA-seq數(shù)據(jù)集進(jìn)行整合分析
我們還可以使用scRNA-seq的數(shù)據(jù)集與空間轉(zhuǎn)錄組的數(shù)據(jù)集進(jìn)行數(shù)據(jù)整合分析醉蚁。這種整合方法非常有用,因?yàn)槲覀兛梢詫cRNA-seq數(shù)據(jù)中的細(xì)胞類型標(biāo)簽轉(zhuǎn)移到Visium空間轉(zhuǎn)錄組數(shù)據(jù)集中鬼店。
在這里网棍,我們將使用Allen Institute中產(chǎn)生的約14,000個(gè)成年小鼠大腦皮質(zhì)細(xì)胞分類學(xué)的scRNA-seq參考數(shù)據(jù)集進(jìn)行整合分析,該數(shù)據(jù)集是通過SMART-Seq2測序技術(shù)生成的妇智。
首先滥玷,我們使用以下命令將seurat數(shù)據(jù)從https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1下載到文件夾中data/spatial/
:
FILE="./data/spatial/allen_cortex.rds"
if [ -e $FILE ]
then
echo "File $FILE is downloaded."
else
echo "Downloading $FILE"
mkdir -p data/spatial
wget -O data/spatial/allen_cortex.rds https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1
fi
Subset ST for cortex
由于我們使用的scRNA-seq數(shù)據(jù)集是由小鼠大腦皮質(zhì)生成的,因此我們需要將visium數(shù)據(jù)集中的cortex數(shù)據(jù)提取出來進(jìn)行整合分析巍棱。
# subset for the anterior dataset
cortex <- subset(brain.integrated, orig.ident == "anterior1")
# there seems to be an error in the subsetting, so the posterior1 image is not
# removed, do it manually
cortex@images$posterior1 = NULL
# subset for a specific region
cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)
# also subset for FC clusters
cortex <- subset(cortex, idents = c(0, 1, 6, 7, 12))
p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE)
p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3)
p1 + p2
進(jìn)行數(shù)據(jù)標(biāo)準(zhǔn)化處理惑畴。
# After subsetting, we renormalize cortex
cortex <- SCTransform(cortex, assay = "Spatial", verbose = FALSE, method = "poisson") %>%
RunPCA(verbose = FALSE)
Integrated with scRNA-seq
接下來,我們使用FindTransferAnchors
和TransferData
函數(shù)將空間轉(zhuǎn)錄組數(shù)據(jù)和scRNA-seq數(shù)據(jù)進(jìn)行整合航徙。
anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT")
predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass,
prediction.assay = TRUE, weight.reduction = cortex[["pca"]])
cortex[["predictions"]] <- predictions.assay
DefaultAssay(cortex) <- "predictions"
SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2,
crop = TRUE)
基于這些預(yù)測得分如贷,我們可以預(yù)測受空間位置限制的特定細(xì)胞類型。接下來,我們還可以基于預(yù)測出的細(xì)胞類型識別空間可變基因杠袱。
cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "markvariogram",
features = rownames(cortex), r.metric = 5, slot = "data")
top.clusters <- head(SpatiallyVariableFeatures(cortex), 4)
SpatialPlot(object = cortex, features = top.clusters, ncol = 2)
我們還可以在ST數(shù)據(jù)中查看每個(gè)聚類群的預(yù)測得分泻红。
VlnPlot(cortex, group.by = "seurat_clusters", features = top.clusters, pt.size = 0,
ncol = 2)
請記住,這些分?jǐn)?shù)僅是預(yù)測細(xì)胞類型的得分霞掺,并不對應(yīng)于某種細(xì)胞類型或相似細(xì)胞的比例。它主要告訴我們某個(gè)位置的基因表達(dá)與特定細(xì)胞類型的基因表達(dá)高度相似/相異讹躯。
如果查看分?jǐn)?shù)菩彬,我們會發(fā)現(xiàn)某些spots在細(xì)胞類型上得到了非常明確的預(yù)測,而其他spots在任何一種細(xì)胞類型上都沒有很高的得分潮梯。
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 20.04 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/czarnewski/miniconda3/envs/scRNAseq2021/lib/libopenblasp-r0.3.10.so
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] stxBrain.SeuratData_0.1.1 patchwork_1.1.1
## [3] ggplot2_3.3.3 Seurat_3.2.3
## [5] SeuratData_0.2.1 dplyr_1.0.3
## [7] Matrix_1.3-2 RJSONIO_1.3-1.4
## [9] optparse_1.6.6
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 colorspace_2.0-0 deldir_0.2-3
## [4] ellipsis_0.3.1 ggridges_0.5.3 rprojroot_2.0.2
## [7] fs_1.5.0 spatstat.data_1.7-0 farver_2.0.3
## [10] leiden_0.3.6 listenv_0.8.0 remotes_2.2.0
## [13] getopt_1.20.3 ggrepel_0.9.1 RSpectra_0.16-0
## [16] fansi_0.4.2 codetools_0.2-18 splines_3.6.1
## [19] knitr_1.30 polyclip_1.10-0 pkgload_1.1.0
## [22] jsonlite_1.7.2 ica_1.0-2 cluster_2.1.0
## [25] png_0.1-7 uwot_0.1.10 sctransform_0.3.2
## [28] shiny_1.5.0 compiler_3.6.1 httr_1.4.2
## [31] assertthat_0.2.1 fastmap_1.0.1 lazyeval_0.2.2
## [34] limma_3.42.0 cli_2.2.0 later_1.1.0.1
## [37] formatR_1.7 htmltools_0.5.1 prettyunits_1.1.1
## [40] tools_3.6.1 rsvd_1.0.3 igraph_1.2.6
## [43] gtable_0.3.0 glue_1.4.2 reshape2_1.4.4
## [46] RANN_2.6.1 rappdirs_0.3.1 spatstat_1.64-1
## [49] Rcpp_1.0.6 scattermore_0.7 vctrs_0.3.6
## [52] nlme_3.1-150 lmtest_0.9-38 xfun_0.20
## [55] stringr_1.4.0 globals_0.14.0 ps_1.5.0
## [58] testthat_3.0.1 mime_0.9 miniUI_0.1.1.1
## [61] lifecycle_0.2.0 irlba_2.3.3 devtools_2.3.2
## [64] goftest_1.2-2 future_1.21.0 MASS_7.3-53
## [67] zoo_1.8-8 scales_1.1.1 spatstat.utils_1.20-2
## [70] promises_1.1.1 parallel_3.6.1 RColorBrewer_1.1-2
## [73] yaml_2.2.1 curl_4.3 gridExtra_2.3
## [76] memoise_1.1.0 reticulate_1.18 pbapply_1.4-3
## [79] rpart_4.1-15 stringi_1.5.3 desc_1.2.0
## [82] pkgbuild_1.2.0 rlang_0.4.10 pkgconfig_2.0.3
## [85] matrixStats_0.57.0 evaluate_0.14 lattice_0.20-41
## [88] tensor_1.5 ROCR_1.0-11 purrr_0.3.4
## [91] labeling_0.4.2 htmlwidgets_1.5.3 cowplot_1.1.1
## [94] processx_3.4.5 tidyselect_1.1.0 parallelly_1.23.0
## [97] RcppAnnoy_0.0.18 plyr_1.8.6 magrittr_2.0.1
## [100] R6_2.5.0 generics_0.1.0 DBI_1.1.1
## [103] mgcv_1.8-33 pillar_1.4.7 withr_2.4.0
## [106] fitdistrplus_1.1-3 abind_1.4-5 survival_3.2-7
## [109] tibble_3.0.5 future.apply_1.7.0 crayon_1.3.4
## [112] KernSmooth_2.23-18 plotly_4.9.3 rmarkdown_2.6
## [115] usethis_2.0.0 grid_3.6.1 data.table_1.13.6
## [118] callr_3.5.1 digest_0.6.27 xtable_1.8-4
## [121] tidyr_1.1.2 httpuv_1.5.5 munsell_0.5.0
## [124] viridisLite_0.3.0 sessioninfo_1.1.1
參考來源:https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/seurat/seurat_07_spatial.html