- Seurat Weekly NO.0 || 開刊詞
- Seurat Weekly NO.1 || 到底分多少個(gè)群是合適的?烘挫!
- Seurat Weekly NO.2 || 我該如何取子集
- Seurat Weekly NO.3 || 直接用Seurat畫fig2
- Seurat Weekly NO.4 || 高效數(shù)據(jù)管理
- Seurat Weekly NO.5 pseudocell該如何計(jì)算||或談Seurat的擴(kuò)展
其實(shí)诀艰,我們?cè)?019年的時(shí)候就介紹過(guò)單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析||Seurat3.1教程:Interoperability between single-cell object formats柬甥,講了單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)對(duì)象的轉(zhuǎn)化。對(duì)R語(yǔ)言境內(nèi)的Seurat其垄,CellDataSet苛蒲,SingleCellExperiment,loom的格式轉(zhuǎn)化起來(lái)還是比較方便的绿满,但是對(duì)于異域的anndata轉(zhuǎn)化一直不是很友好臂外,所以我借此機(jī)會(huì)學(xué)會(huì)了python(在等短信驗(yàn)證碼的那六十秒之內(nèi))。anndata的數(shù)據(jù)就在python中分析喇颁,完事漏健。
但是這樣的轉(zhuǎn)化總有需求,于是橘霎,Seurat團(tuán)隊(duì)開發(fā)了SeuratDisk包蔫浆,希望滿足數(shù)據(jù)在Seurat和Scanpy之間快速搬家的需求。遇到規(guī)范的數(shù)據(jù)茎毁,當(dāng)然可以一鍵搬家克懊,但是如果有一點(diǎn)不同,就會(huì)帶來(lái)各種Error七蜘。這里我們就以此為契機(jī),看看遇到Error該如何處理墙懂,這屬于進(jìn)階課程了橡卤,遇到問(wèn)題Google解決不了了,我們?cè)趺崔k损搬?
library(Seurat)
我從網(wǎng)上下了一個(gè)不知道做了什么處理的單細(xì)胞數(shù)據(jù)碧库,只知道是h5ad格式的,當(dāng)我們用ReadH5AD
讀取時(shí)
cellxgene1 <- ReadH5AD(file = "some.processed.h5ad")
給出了這樣的報(bào)錯(cuò):
# 報(bào)錯(cuò)信息太長(zhǎng)巧勤,我們只顯示最有用的嵌灰。
In addition: Warning message:
Functionality for reading and writing H5AD files is being moved to SeuratDisk
For more details, please see https://github.com/mojaveazure/seurat-disk
and https://mojaveazure.github.io/seurat-disk/index.html
顯然,作者這是在提示我們安裝新的R包:seurat-disk颅悉,于是我們挺聽話地去安裝了沽瞭。
remotes::install_github("mojaveazure/seurat-disk")
library(SeuratDisk)
Convert("some.processed.h5ad", dest = "h5seurat", overwrite = TRUE)`
Warning: Unknown file type: h5ad
Warning: 'assay' not set, setting to 'RNA'
Creating h5Seurat file for version 3.1.2
Adding X as data
Adding X as counts
Error: Cannot find feature names in this H5AD file
同樣,轉(zhuǎn)角處遇到Error剩瓶,于是直接google Error信息驹溃,我們看到Seurat團(tuán)隊(duì)給出的答案:
For this specific H5AD file, it's compressed using the LZF filter. This filter is Python-specific, and cannot easily be used in R. To use this file with Seurat and SeuratDisk, you'll need to read it in Python and save it out using the gzip compression
import anndata
adata = anndata.read("some.processed.h5ad")
adata.write("some.processed.gzip.h5ad", compression="gzip")
這顯然是python語(yǔ)法,在R里面該如何操作呢延曙?自然是在R里調(diào)用Python了,所以別問(wèn)人家用的是R還是python了布疙,這兩個(gè)可以相互運(yùn)行的語(yǔ)言,其實(shí)是一種語(yǔ)言灵临。
library(reticulate)
reticulate::py_config()
Sys.which('python') # 該python 下要安裝了anndata
# usethis::edit_r_environ()
filesmy2 ='some.processed.gzip.h5ad'
ad <- import("anndata", convert = FALSE)
some_ad <- ad$read_h5ad(filesmy)
adata = anndata.read("some.processed.h5ad")
some_ad$write("some.processed.gzip.h5ad", compression="gzip")
之后俱诸,我們終于可以轉(zhuǎn)化了。
Convert("some.processed.gzip.h5ad", dest = "h5seurat", overwrite = TRUE)
Warning: Unknown file type: h5ad
Warning: 'assay' not set, setting to 'RNA' # 這里其實(shí)要格外小心赶诊,看看數(shù)據(jù)里面是不是 RNA啊
Creating h5Seurat file for version 3.1.5.9900
Adding X as data
Adding raw/X as counts
Adding meta.features from raw/var
Adding X_umap as cell embeddings for umap
Convert 成功后目錄下多了一個(gè)文件:some.processed.gzip.h5seurat舔痪,就差一個(gè)Seurat對(duì)象了锄码。
cellxgene <- LoadH5Seurat('some.processed.gzip.h5seurat',
assays ="RNA")
Validating h5Seurat file
Initializing RNA with data
Error in dimnamesGets(x, value) :
invalid dimnames given for “dgCMatrix” object
同樣滋捶,在轉(zhuǎn)角處遇到了Error 重窟,于是我們?cè)俅蜧oogle 這個(gè)Error 惧财。一番瀏覽垮衷,我們發(fā)現(xiàn)自己遇到了Google解決不了的問(wèn)題。
決心debug這個(gè)函數(shù)拭抬。
debug(LoadH5Seurat)
cellxgene <- LoadH5Seurat('some.processed.gzip.h5seurat',
assays ="RNA")
# 一波回車
debug( as.Seurat) #因?yàn)長(zhǎng)oadH5Seurat里面用了這個(gè)函數(shù)垛耳,所以在LoadH5Seurat的debug環(huán)境種再debug as.Seurat
# 一波回車
debug(AssembleAssay) #因?yàn)閍s.Seurat里面用了這個(gè)函數(shù),所以在as.Seurat的debug環(huán)境種再debug AssembleAssay
# 一波回車
我們找到問(wèn)題了:
slots.assay <- names(x = Filter(f = isTRUE, x = index[[assay]]$slots))
slots <- slots %||% slots.assay
slots <- match.arg(arg = slots, choices = slots.assay, several.ok = TRUE)
if (!any(c("counts", "data") %in% slots)) {
stop("At least one of 'counts' or 'data' must be loaded",
call. = FALSE)
}
assay.group <- file[["assays"]][[assay]]
features <- assay.group[["features"]][]
if ("counts" %in% slots && !"data" %in% slots) {
if (verbose) {
message("Initializing ", assay, " with counts")
}
counts <- as.matrix(x = assay.group[["counts"]])
rownames(x = counts) <- features # 還是第一個(gè)features <- assay.group[["features"]][]
colnames(x = counts) <- Cells(x = file)
obj <- CreateAssayObject(counts = counts, min.cells = -1,
min.features = -1)
}
else {
if (verbose) {
message("Initializing ", assay, " with data")
}
data <- as.matrix(x = assay.group[["data"]])
rownames(x = data) <- features # 還是第一個(gè)features <- assay.group[["features"]][]
colnames(x = data) <- Cells(x = file)
obj <- CreateAssayObject(data = data)
}
也就是在這部分代碼中作者是認(rèn)為,slots 之counts和data 的行名是一樣的花墩,其實(shí)我們知道Seurat的每部分存的數(shù)據(jù)其實(shí)都是可以獨(dú)立操作的,所以可能并不是都一樣冰蘑。
懷疑就要檢查:
Browse[8]> counts <- as.matrix(x = assay.group[["counts"]])
Browse[8]> dim(counts)
[1] 33567 10550
Browse[8]> data <- as.matrix(x = assay.group[["data"]])
Browse[8]> dim(data)
[1] 33421 10550
Browse[8]> length(features )
[1] 33567
果然不一樣祠肥。
既然我們找到了invalid dimnames given for “dgCMatrix” object
Error 的報(bào)錯(cuò)原因我們就可以針對(duì)counts來(lái)轉(zhuǎn)化仇箱,data的部分我們?cè)赟eurat里面做。為什么不找到data的行名忠烛,賦值給data呢美尸?這里留作思考題吧师坎。壞堪滨。
我們開始改造原函數(shù)使之能夠接受slots='counts'
這樣的限定,于是我們找到 as.Seurat源代碼中調(diào)用AssembleAssay的部分:
for (assay in names(x = assays)) {
assay.objects[[assay]] <- AssembleAssay(
assay = assay,
file = x,
slots = assays[[assay]], #這里強(qiáng)制對(duì)所有的slots進(jìn)行轉(zhuǎn)化,我們要讓他接受傳參
verbose = verbose
)
}
除了在函數(shù)參數(shù)中加入slots = NULL,
之外犯眠,這部分改為:
for (assay in names(x = assays)) {
assay.objects[[assay]] <- AssembleAssay(
assay = assay,
file = x,
slots =slots %||% assays[[assay]],
verbose = verbose
)
}
考慮到基本上要改動(dòng)https://github.com/mojaveazure/seurat-disk/blob/master/R/LoadH5Seurat.R這個(gè)腳本的大部分函數(shù)筐咧,所以我們Fork一份seurat-disk 對(duì)應(yīng)的我們改https://github.com/tuqiang2014/seurat-disk/blob/master/R/LoadH5Seurat.R
至于怎么改的量蕊,這里就略過(guò)了残炮。如果遇到同樣的問(wèn)題缩滨,你可以安裝這個(gè)改過(guò)了的泉瞻。改完之后袖牙,我們卸載掉這個(gè)官方的seurat-disk 鞭达,安裝自己改過(guò)的皇忿。
detach("package:SeuratDisk", unload = TRUE)
remove.packages('SeuratDisk')
remotes::install_github("tuqiang2014/seurat-disk") # tuqiang2014就是我啦
library(SeuratDisk)
undebug(LoadH5Seurat)
undebug( as.Seurat)
undebug(AssembleAssay) # 這里的提示忽略禁添,不是同一個(gè)環(huán)境。
cellxgene <- LoadH5Seurat('some.processed.gzip.h5seurat',
assays ="RNA",slots='counts')
# 提示信息:
Validating h5Seurat file
Initializing RNA with counts
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Adding feature-level metadata for RNA
Adding reduction umap
Adding cell embeddings for umap
Adding miscellaneous information for umap
Adding command information
Adding cell-level metadata
于是芹啥,一個(gè)標(biāo)致的Seurat對(duì)象就呈現(xiàn)在我們面前了:
cellxgene
An object of class Seurat
XXXX features across XXXXX samples within 1 assay
Active assay: RNA (XXXX features, 0 variable features)
1 dimensional reduction calculated: umap
可以在Seurat里面盡情的分析墓怀。
library(ggplot2)
ggplot(DimPlot(cellxgene)$data,aes(umap_1 , umap_2 ,fill=ident)) +
geom_point(shape=21,colour="black",stroke=0.25,alpha=0.8) +
DimPlot(cellxgene,label = T)$theme +
theme_bw()+ NoLegend()
單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析||Seurat3.1教程:Interoperability between single-cell object formats
Convert AnnData to Seurat fails with raw h5ad
conversion_vignette
convert-anndata RMD
Error: Cannot find feature names in this H5AD file #7