作為紀念開個專欄:每周精選Seurat社區(qū)有趣的問答以饗國內(nèi)Seurat的用戶吝岭。
為什么要做這個事情呢寞射?
- 翻譯學(xué)習(xí)法
- 認知學(xué)習(xí)法
- 通勤學(xué)習(xí)法
- 追蹤工具動態(tài)
- 相信我,你并不孤獨
因為國內(nèi)的單細胞數(shù)據(jù)分析人員越來越多绰咽,而Seurat為單細胞數(shù)據(jù)分析提供了一個很好的思考框架粹污,當(dāng)然Seurat發(fā)表的文章有很多。大家在應(yīng)用這個單細胞數(shù)據(jù)分析工具的時候不免會遇到這樣或那樣的問題捏检,這雖然是讓人苦惱的荞驴,但是以為偉人說過:好的問題要比好的答案價值百倍。于是贯城,我在Seurat的github上watch了所有的問題熊楼,所以我的個人郵箱是這樣的:
所以在通勤路上看看大家都踩了Seurat的哪些坑,Seurat社區(qū)成員又是如何回復(fù)的能犯,不能不說是一件有意思的事兒鲫骗。何況還能學(xué)習(xí)一下英語啦_.
計劃是每周精選不多于10條的Issues犬耻,在翻譯和復(fù)現(xiàn)這些issue的時候也會摻雜一些自己對Seurat的學(xué)習(xí)心得,以促進Seurat學(xué)者了解這個項目执泰。
目前github上面大部分的交流用的還是英語枕磁,我們?yōu)槭裁床挥脻h語在上面交流呢?那樣就達不到學(xué)習(xí)英語的目的了吧术吝。
凡事善始者實繁计济,克終者蓋寡。Seurat Weekly能做幾期呢顿苇?我們不能承諾峭咒,畢竟是一項無人資助的事業(yè),看緣分呢纪岁。
最為NO.0凑队,今天主要探索一下選題與排版的一般規(guī)則。
選題的第一個規(guī)則就是某個話題出現(xiàn)的頻率幔翰,我們相信一個有很多人愿意討論的話題更大概率是一個好問題漩氨。其次是主編本人覺得有意思的問題。
至于排版遗增,我們按照一般的FAQ的形式給出叫惊,問題部分我們給出原文描述,答案就是主編在看過別人的回答之后做修,夾雜著個人觀點的品論霍狰。
每個問題,我們也給出在github的鏈接饰及,方便讀者朋友就原問題進行解答蔗坯。
下面走幾個例子:
Getting parameters from pre-existing reduction (umap) (#3053)
github地址: https://github.com/satijalab/seurat/issues/3053
這位oh111 的意思應(yīng)該是手里有了Seurat對象,想知道這個對象之前的作者是如何操作的燎含。其實Seurat是給了每個操作過程的參數(shù)記錄的:
就拿我們seurat自帶的數(shù)據(jù)集來看吧:
Seurat::pbmc_small@commands
這返回的是一個list宾濒,他說想看PCA的參數(shù),其實這樣就可以了:
Seurat::pbmc_small@commands$RunPCA.RNA
Command: RunPCA(object = pbmc_small, features = VariableFeatures(object = pbmc_small), verbose = FALSE)
Time: 2018-08-28 04:34:56
assay : RNA
features : PPBP IGLL5 VDAC3 CD1C AKR1C3 PF4 MYL9 GNLY TREML1 CA2 SDPR PGRMC1 S100A8 TUBB1 HLA-DQA1 PARVB RUFY1 HLA-DPB1 RP11-290F20.3 S100A9
compute.dims : 20
rev.pca : FALSE
weight.by.var : TRUE
verbose : FALSE
print.dims : 1 2 3 4 5
features.print : 30
reduction.name : pca
reduction.key : PC
seed.use : 42
Reorder cells by expression value in DoHeatmap & Dendogram (#3036)
github 地址:https://github.com/satijalab/seurat/issues/3036
這是一個可視化細節(jié)的問題屏箍,有時候我們用默認參數(shù)出來的圖并不能直接達到發(fā)表的水平绘梦,或者自己有些小心思想改一下這個圖,最常見的就是該標簽的順序赴魁。其實我們知道這個一般是通過因子變量的levels來控制的卸奉。
有位仁兄給出了這個鏈接: https://stackoverflow.com/questions/52136211/how-to-reorder-cells-in-doheatmap-plot-in-seurat-ggplot2
那么我們來看一下Seurat的繪圖是如何實現(xiàn)的,看DoHeatmapd的源代碼:
DoHeatmap
function (object, features = NULL, cells = NULL, group.by = "ident",
group.bar = TRUE, group.colors = NULL, disp.min = -2.5, disp.max = NULL,
slot = "scale.data", assay = NULL, label = TRUE, size = 5.5,
hjust = 0, angle = 45, raster = TRUE, draw.lines = TRUE,
lines.width = NULL, group.bar.height = 0.02, combine = TRUE)
{
cells <- cells %||% colnames(x = object)
if (is.numeric(x = cells)) {
cells <- colnames(x = object)[cells]
}
assay <- assay %||% DefaultAssay(object = object)
DefaultAssay(object = object) <- assay
features <- features %||% VariableFeatures(object = object)
features <- rev(x = unique(x = features))
disp.max <- disp.max %||% ifelse(test = slot == "scale.data",
yes = 2.5, no = 6)
possible.features <- rownames(x = GetAssayData(object = object,
slot = slot))
if (any(!features %in% possible.features)) {
bad.features <- features[!features %in% possible.features]
features <- features[features %in% possible.features]
if (length(x = features) == 0) {
stop("No requested features found in the ", slot,
" slot for the ", assay, " assay.")
}
warning("The following features were omitted as they were not found in the ",
slot, " slot for the ", assay, " assay: ", paste(bad.features,
collapse = ", "))
}
data <- as.data.frame(x = as.matrix(x = t(x = GetAssayData(object = object,
slot = slot)[features, cells, drop = FALSE])))
object <- suppressMessages(expr = StashIdent(object = object,
save.name = "ident"))
group.by <- group.by %||% "ident"
groups.use <- object[[group.by]][cells, , drop = FALSE]
plots <- vector(mode = "list", length = ncol(x = groups.use))
for (i in 1:ncol(x = groups.use)) {
data.group <- data
group.use <- groups.use[, i, drop = TRUE]
if (!is.factor(x = group.use)) {
group.use <- factor(x = group.use)
}
names(x = group.use) <- cells
if (draw.lines) {
lines.width <- lines.width %||% ceiling(x = nrow(x = data.group) *
0.0025)
placeholder.cells <- sapply(X = 1:(length(x = levels(x = group.use)) *
lines.width), FUN = function(x) {
return(RandomName(length = 20))
})
placeholder.groups <- rep(x = levels(x = group.use),
times = lines.width)
group.levels <- levels(x = group.use)
names(x = placeholder.groups) <- placeholder.cells
group.use <- as.vector(x = group.use)
names(x = group.use) <- cells
group.use <- factor(x = c(group.use, placeholder.groups),
levels = group.levels)
na.data.group <- matrix(data = NA, nrow = length(x = placeholder.cells),
ncol = ncol(x = data.group), dimnames = list(placeholder.cells,
colnames(x = data.group)))
data.group <- rbind(data.group, na.data.group)
}
lgroup <- length(levels(group.use))
plot <- SingleRasterMap(data = data.group, raster = raster,
disp.min = disp.min, disp.max = disp.max, feature.order = features,
cell.order = names(x = sort(x = group.use)), group.by = group.use)
if (group.bar) {
default.colors <- c(hue_pal()(length(x = levels(x = group.use))))
cols <- group.colors[1:length(x = levels(x = group.use))] %||%
default.colors
if (any(is.na(x = cols))) {
cols[is.na(x = cols)] <- default.colors[is.na(x = cols)]
cols <- Col2Hex(cols)
col.dups <- sort(x = unique(x = which(x = duplicated(x = substr(x = cols,
start = 1, stop = 7)))))
through <- length(x = default.colors)
while (length(x = col.dups) > 0) {
pal.max <- length(x = col.dups) + through
cols.extra <- hue_pal()(pal.max)[(through +
1):pal.max]
cols[col.dups] <- cols.extra
col.dups <- sort(x = unique(x = which(x = duplicated(x = substr(x = cols,
start = 1, stop = 7)))))
}
}
group.use2 <- sort(x = group.use)
if (draw.lines) {
na.group <- RandomName(length = 20)
levels(x = group.use2) <- c(levels(x = group.use2),
na.group)
group.use2[placeholder.cells] <- na.group
cols <- c(cols, "#FFFFFF")
}
pbuild <- ggplot_build(plot = plot)
names(x = cols) <- levels(x = group.use2)
y.range <- diff(x = pbuild$layout$panel_params[[1]]$y.range)
y.pos <- max(pbuild$layout$panel_params[[1]]$y.range) +
y.range * 0.015
y.max <- y.pos + group.bar.height * y.range
plot <- plot + annotation_raster(raster = t(x = cols[group.use2]),
xmin = -Inf, xmax = Inf, ymin = y.pos, ymax = y.max) +
coord_cartesian(ylim = c(0, y.max), clip = "off") +
scale_color_manual(values = cols)
if (label) {
x.max <- max(pbuild$layout$panel_params[[1]]$x.range)
x.divs <- pbuild$layout$panel_params[[1]]$x.major
x <- data.frame(group = sort(x = group.use),
x = x.divs)
label.x.pos <- tapply(X = x$x, INDEX = x$group,
FUN = median) * x.max
label.x.pos <- data.frame(group = names(x = label.x.pos),
label.x.pos)
plot <- plot + geom_text(stat = "identity", data = label.x.pos,
aes_string(label = "group", x = "label.x.pos"),
y = y.max + y.max * 0.03 * 0.5, angle = angle,
hjust = hjust, size = size)
plot <- suppressMessages(plot + coord_cartesian(ylim = c(0,
y.max + y.max * 0.002 * max(nchar(x = levels(x = group.use))) *
size), clip = "off"))
}
}
plot <- plot + theme(line = element_blank())
plots[[i]] <- plot
}
if (combine) {
plots <- CombinePlots(plots = plots)
}
return(plots)
}
<bytecode: 0x11837bb8>
<environment: namespace:Seurat>
我們發(fā)現(xiàn)Seurat的圖是用ggplot2實現(xiàn)的颖御,這樣我們就可以基于函數(shù)的返回對象來用自己的ggplot功底修復(fù)了:
require(Seurat)
p <- DoHeatmap(pbmc_small)
p$theme
p$layers
p$mapping
head(p$data)
Feature Cell Expression Identity
1 S100A9 ATGCCAGAACGACT -0.7639656 0
2 RP11-290F20.3 ATGCCAGAACGACT -0.3730316 0
3 HLA-DPB1 ATGCCAGAACGACT -1.0399941 0
4 RUFY1 ATGCCAGAACGACT -0.4098329 0
5 PARVB ATGCCAGAACGACT -0.3461794 0
6 HLA-DQA1 ATGCCAGAACGACT -0.6717070 0
Time series dataset in two conditions #3040
github 地址:https://github.com/satijalab/seurat/issues/3040
這是一個常見的問題择卦,多個樣本在許多地方還是值得探討的,這位回答者直接給出了兩篇參考文獻:
I would recommend that you review the Guided tutorial, Multiple dataset integration (specifically the SCT method), and the Stimulated vs Control PBMCs, in that order. Due the large probable size of your final dataset, I think that the manipulations found within the Mouse Cell Atlas Vignette might also prove useful. The VIsualization and Cell cycle regression vignettes were also particularly helpful for our analysis of similar conditions. Lastly, I recommend that you review the methods in Farnsworth et al. 2020 as well as Soldatov et al., 2019 for further help. Hope that this was useful.
Optimal strategy to process samples to compare two different condition. (#3019)
這個其實和上個問題比較像,都是處理多個樣本秉继,這里也有人士給出:
I wouldn't recommend using cellranger aggr
as it will downsample everything to a similar number of counts as the sample with the lowest counts, and so potentially will throw out a lot of data. You could quantify each dataset (using cellranger, or another tool like Alevin) and first merge them in Seurat and check if there are batch differences between the datasets. If there are, you could run the integration methods
Deploying Shiny apps using Seurat library to shinyapps.io #2716
github 地址: https://github.com/satijalab/seurat/issues/2716
這是一類問題屬于對Seurat的擴展祈噪,這里是想把Seurat擴展到Shiny程序中,也就是給他界面化尚辑。這要求再開發(fā)者不單要對Seurat的函數(shù)和數(shù)據(jù)對象及其依賴的R包和環(huán)境有所了解辑鲤,還要求他要懂得Shiny的語法和結(jié)構(gòu)。其實這個問題下的討論多是想要彌補Seurat與Shiny的界限杠茬。
好了月褥,今天就到這里,感謝大家的陪伴瓢喉。對了宁赤,你在github上面提問了沒?