本教程介紹了Seurat包與sctransform 包一起分析時(shí)候的一般方法虏两。首先你也許會(huì)問(wèn):sctransform包是用來(lái)干嘛的宠页?
這是一個(gè)值得關(guān)心和探討的問(wèn)題拴泌,我也將做一個(gè)簡(jiǎn)單的回答殉农。最好的辦法當(dāng)然是到包所在的官網(wǎng)去讀人家的自我介紹了sctransform: Variance Stabilizing Transformations for Single Cell UMI Data
A normalization method for single-cell UMI count data using a variance stabilizing transformation. The transformation is based on a negative binomial regression model with regularized parameters. As part of the same regression framework, this package also provides functions for batch correction, and data correction. See Hafemeister and Satija 2019 <doi:10.1101/576827> for more details.
有沒(méi)有感覺(jué)很強(qiáng)大,在Seurat里面被封為SCTransform()函數(shù)可以:
- 代替 NormalizeData, ScaleData, and FindVariableFeatures.三個(gè)函數(shù)
- 轉(zhuǎn)換完了在SCT assay 里
- 在均一化的同時(shí)可以移除線(xiàn)粒體細(xì)胞等的影響
library(Seurat)
packageVersion("Seurat")
# https://satijalab.org/seurat/mca.html
library(dplyr)
library(ggsci)
library(ggplot2)
library(sctransform)
# Load the PBMC dataset
list.files("D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
?Read10X
pbmc.data <- Read10X(data.dir = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
# Initialize the Seurat object with the raw (non-normalized data).
?CreateSeuratObject
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc
一步數(shù)據(jù)均一化蟀淮,也是比較慢的最住。
# store mitochondrial percentage in object meta data
pbmc <- PercentageFeatureSet(pbmc, pattern = "^MT-", col.name = "percent.mt")
# run sctransform
pbmc <- SCTransform(pbmc, vars.to.regress = "percent.mt", verbose = FALSE)
pbmc
An object of class Seurat
26286 features across 2700 samples within 2 assays
Active assay: SCT (12572 features)
1 other assay present: RNA
然后是標(biāo)準(zhǔn)操作。
# These are now standard steps in the Seurat workflow for visualization and clustering
pbmc <- RunPCA(pbmc, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindClusters(pbmc, verbose = FALSE)
DimPlot(pbmc, label = TRUE) + NoLegend()
為什么在轉(zhuǎn)化之后PC軸要比之前我們用NormalizeData 的時(shí)候要多呢?
- 更有效地消除技術(shù)影響
- 更高的PC維度可能代表一些微妙的生物學(xué)差異
- In addition, sctransform returns 3,000 variable features by default, instead of 2,000. The rationale is similar, the additional variable features are less likely to be driven by technical differences across cells, and instead may represent more subtle biological fluctuations.
我們還可以像下面這樣去運(yùn)行:
pbmc <- CreateSeuratObject(pbmc_data) %>% PercentageFeatureSet(pattern = "^MT-", col.name = "percent.mt") %>%
SCTransform(vars.to.regress = "percent.mt") %>% RunPCA() %>% FindNeighbors(dims = 1:30) %>%
RunUMAP(dims = 1:30) %>% FindClusters()
Where are normalized values stored for sctransform?
- pbmc[["SCT"]]@scale.data
- pbmc[["SCT"]]@data 用于可視化的數(shù)據(jù)
- pbmc[["SCT"]]@counts 校正后的count
- You can use the corrected log-normalized counts for differential expression and integration. However, in principle, it would be most optimal to perform these calculations directly on the residuals (stored in the scale.data slot) themselves. This is not currently supported in Seurat v3, but will be soon.
第四條很容易讓人困惑怠惶,先不說(shuō)技術(shù)上的涨缚,僅字面意思:這里給你的并不是最好的,最好的目前還沒(méi)有策治。這句話(huà)不能和最好的方法出來(lái)的時(shí)候再說(shuō)嗎脓魏?以后的方法比現(xiàn)在的好,這很容易理解啊通惫,但是你要告訴我最好的總在未來(lái)茂翔,我就糾結(jié)了。而且履腋,FAQ4中建議使用RNA assay做差異分析珊燎,而在這里有建議使用SCT assay的scale.data(而且This is not currently supported in Seurat v3, but will be soon.)。
官方的解釋是這樣的:
Hi,
Thanks for the question, and I apologize for the confusion. We're working on allowing for DE to be performed on pearson residuals from SCTransform in an optimal way. Until then, its easiest for us to advise users just to use the RNA assay. But if you're really excited to give it a try, it is not invalid to do so. Still, in the interest of simplicity, we'll keep the FAQ as-is.
best,
Rahul
https://github.com/satijalab/seurat/issues/1421
# These are now standard steps in the Seurat workflow for visualization and clustering Visualize
# canonical marker genes as violin plots.
VlnPlot(pbmc, features = c("CD8A", "GZMK", "CCL5", "S100A4", "ANXA1", "CCR7", "ISG15", "CD3D"),
pt.size = 0.2, ncol = 4)
# Visualize canonical marker genes on the sctransform embedding.
FeaturePlot(pbmc, features = c("CD8A", "GZMK", "CCL5", "S100A4", "ANXA1", "CCR7"), pt.size = 0.2,
ncol = 3)
FeaturePlot(pbmc, features = c("CD3D", "ISG15", "TCL1A", "FCER2", "XCL1", "FCGR3A"), pt.size = 0.2,
ncol = 3)