一、FindAllMarkers()函數(shù)報錯
最近事情太多秋泄,好久沒有寫簡書琐馆,斷更很久,慚愧印衔。
前幾天,將服務(wù)器的seurat包升級到4.1.0版本姥敛,在跑單細胞轉(zhuǎn)錄組流程的時候發(fā)現(xiàn)在FindAllMarkers步驟報錯奸焙,想細究下原因。
前面的流程是我使用SCTransform方法對單細胞數(shù)據(jù)進行標準化彤敛;
代碼如下:
DefaultAssay(seurat_obj) <- "SCT"
Idents(seurat_obj) <- "cluster"
markers_all <- FindAllMarkers(object = seurat_obj, min.pct = 0.25, logfc.threshold = 0.25, only.pos=T)
報錯的截圖:
提示的信息是我需要在執(zhí)行FindMarkers()函數(shù)前与帆,先運行PrepSCTFindMarkers()函數(shù)。
二墨榄、PrepSCTFindMarkers()函數(shù)的用途
去年玄糟,對于SCTransform歸一化處理后的單細胞數(shù)據(jù)如何進行差異基因表達分析,我就很糾結(jié)袄秩,之前寫了一篇隨筆《sctransform預(yù)處理后阵翎,如何進行差異表達分析》逢并。如今,在2022年1月14日郭卫,seurat終于對SCTransform歸一化處理后如何合理進行差異表達分析砍聊,官宣了PrepSCTFindMarkers()函數(shù)。
seurat團隊應(yīng)該是慎之又慎贰军,才宣布此函數(shù)玻蝌,很想看看它具體做了一些什么操作。
查了下PrepSCTFindMarkers()的說明
該函數(shù)的說明文檔:
https://rdrr.io/cran/Seurat/man/PrepSCTFindMarkers.html
https://cran.r-project.org/web/packages/Seurat/Seurat.pdf
函數(shù)用途:
Prepare object to run differential expression on SCT assay with multiple models
詳細說明:
Given a merged object with multiple SCT models, this function uses minimum of the median UMI (calculated using the raw UMI counts) of individual objects to reverse the individual SCT regression model using minimum of median UMI as the sequencing depth covariate. The counts slot of the SCT assay is replaced with recorrected counts and the data slot is replaced with log1p of recorrected counts.
大致翻譯一下:
對多樣本的單細胞轉(zhuǎn)錄組數(shù)據(jù)整合后生成的seurat對象词疼,提前是已經(jīng)進行了SCTransform歸一化處理俯树。針對SCT模式下,進行差異表達分析贰盗,需要對seurat對象進行預(yù)處理许饿,修正 SCT 模式下的counts矩陣和data矩陣。該函數(shù)計算每個樣本的UMI中位數(shù)(使用原始 UMI 計數(shù)計算)童太,取其最小值米辐,然后應(yīng)用于每個樣本的SCT回歸模型,使用UMI 中位數(shù)的最小值作為測序深度協(xié)變量书释。SCT模式下的counts矩陣被重新校正的counts值所取代翘贮,同時,data矩陣也被重新校正counts的 log1p 替換爆惧。
換句話說狸页,用SCT模式的data矩陣做差異表達分析,不是很妥扯再,需要預(yù)先用PrepSCTFindMarkers()函數(shù)校正芍耘,校正后的data矩陣,更適合做下游的差異基因表達分析熄阻。
函數(shù)用法:
PrepSCTFindMarkers(object, assay = "SCT", verbose = TRUE)
那為什么用原先的SCT模式的data做差異表達分析不行斋竞?
作者在曾在seurat的discussions部分中也給出了一段說明:
We had anticipated extending Seurat to actively support DE using the pearson residuals of sctransform, but have decided not to do so. In some cases, Pearson residuals may not be directly comparable across different datasets, particularly if there are batch effects that are unrelated to sequencing depth. While it is possible to correct these differences using the SCTransform-based integration workflow for the purposes of visualization/clustering/etc., we do not recommend running differential expression directly on Pearson residuals. Instead, we recommend running DE on the standard RNA assay.
我們曾期望擴展Seurat的功能以支持使用sctransform的pearson殘差進行DE分析,但已決定不這樣做秃殉。在某些情況下坝初,皮爾遜殘差可能無法在不同的數(shù)據(jù)集之間直接進行比較,尤其是當存在與測序深度無關(guān)的批次效應(yīng)時钾军。雖然為了可視化/聚類等目的鳄袍,可以使用基于SCTransform的集成工作流來糾正這些差異,但我們不建議直接在Pearson殘差上運行差分表達式吏恭。相反拗小,我們建議在標準RNA分析中運行DE分析。
我們可以這樣理解樱哼,Satija團隊本來期望用sctransform的皮爾遜殘差進行DE分析哀九,但是他們發(fā)現(xiàn)剿配,Pearson殘差可能無法在不同數(shù)據(jù)集之間直接進行比較,決定不這么做勾栗〔依椋基于SCTransform 的分析工作流可以剔除實驗差異以實現(xiàn)可基因可視化和聚類等功能。但是他們建議在標準RNA assay中運行DE分析围俘。
給出的案例是:
針對SCTransform歸一化流程砸讳,在FindMarkers之前需要運行PrepSCTFindMarkers()函數(shù)。
data("pbmc_small")
pbmc_small1 <- SCTransform(object = pbmc_small, variable.features.n = 20)
pbmc_small2 <- SCTransform(object = pbmc_small, variable.features.n = 20)
pbmc_merged <- merge(x = pbmc_small1, y = pbmc_small2)
pbmc_merged <- PrepSCTFindMarkers(object = pbmc_merged)
markers <- FindMarkers(
object = pbmc_merged,
ident.1 = "0",
ident.2 = "1",
assay = "SCT"
)
pbmc_subset <- subset(pbmc_merged, idents = c("0", "1"))
markers_subset <- FindMarkers(
object = pbmc_subset,
ident.1 = "0",
ident.2 = "1",
assay = "SCT",
recorrect_umi = FALSE
)
三界牡、PrepSCTFindMarkers()函數(shù)解析
library(Seurat)
data("pbmc_small")
pbmc_small1 <- SCTransform(object = pbmc_small, variable.features.n = 20)
pbmc_small2 <- SCTransform(object = pbmc_small, variable.features.n = 20)
pbmc_merged <- merge(x = pbmc_small1, y = pbmc_small2)
pbmc_merged
# An object of class Seurat
# 450 features across 160 samples within 2 assays
# Active assay: SCT (220 features, 0 variable features)
# 1 other assay present: RNA
下面我們進入到PrepSCTFindMarkers()函數(shù)中簿寂,代碼較長,我們分解一下:
step1:判斷seurat對象的SCT模型的數(shù)目宿亡,如果只有一個樣本常遂,跳出此函數(shù),不需要進行SCT模型的counts和data數(shù)據(jù)校正挽荠;
在我們的案例中克胳,有2個樣本,即兩個SCR模型圈匆。
if (length(x = levels(x = object[[assay]])) == 1) {
if (verbose) {
message("Only one SCT model is stored - skipping recalculating corrected counts")
}
return(object)
}
step2:此處調(diào)用了SCTResults函數(shù)漠另,計算每個SCT模型的UMI中位數(shù)。先不細究該函數(shù)跃赚。
observed_median_umis <- lapply(
X = SCTResults(object = object[[assay]], slot = "cell.attributes"),
FUN = function(x) median(x[, "umi"])
)
observed_median_umis
# $model1
# [1] 180
#
# $model1.1
# [1] 180
step3:計算min_median_umi
model.list <- slot(object = object[[assay]], name = "SCTModel.list")
model.list
# $model1
# An sctransform model.
# Model formula: y ~ log_umi
# Parameters stored for 220 features, 80 cells
# $model1.1
# An sctransform model.
# Model formula: y ~ log_umi
# Parameters stored for 220 features, 80 cells
median_umi.status <- lapply(X = model.list,
FUN = function(x) { return(tryCatch(
expr = slot(object = x, name = 'median_umi'),
error = function(...) {return(NULL)})
)})
# median_umi.status
# $model1
# [1] 180
#
# $model1.1
# [1] 180
if (any(is.null(x = unlist(x = median_umi.status)))){
# For old SCT objects median_umi is set to median umi as calculated from obserbed UMIs
slot(object = object[[assay]], name = "SCTModel.list") <- lapply(X = model.list,
FUN = UpdateSlots)
SCTResults(object = object[[assay]], slot = "median_umi") <- observed_median_umis
}
model_median_umis <- SCTResults(object = object[[assay]], slot = "median_umi")
min_median_umi <- min(unlist(x = observed_median_umis))
step4:生成校正的corrected_counts矩陣
step5:對corrected_counts取log1p笆搓,生成校正的corrected_data矩陣
時間太趕,先大致記錄下纬傲,后面再補充满败。