SingleR包可基于參考數(shù)據(jù)集社裆,實現(xiàn)對單細胞數(shù)據(jù)細胞類型的自動注釋迟蜜。celldex包提供若干常用的人/老鼠的注釋細胞類型的Bulk RNA-seq/microarray參考數(shù)據(jù)无切。
一蟀俊、SingleR自動注釋流程
1、準備輸入數(shù)據(jù)
- 同一格式要求:(1)矩陣(matrix)/稀疏矩陣(dgCMatrix)/數(shù)據(jù)框(data.frame)均可订雾,或者是
SummarizedExperiment
對象(默認指定為logcounts slot);(2)必須是經(jīng)標準化并log轉(zhuǎn)換的表達矩陣矛洞,對應Seurat
對象的data
slot洼哎。 - 未知細胞類型、待注釋的單細胞表達矩陣
#例如從 Seurat對象中提取
library(scRNAseq)
hESCs <- LaMannoBrainData('human-es') #SingleCellExperiment對象
assays(hESCs) #only counts
hESCs <- scuttle::logNormCounts(hESCs)
# add logcounts slot
hESCs
library(Seurat)
scRNA = as.Seurat(hESCs)
scRNA
head(scRNA@meta.data)
scRNA <- NormalizeData(scRNA, normalization.method = "LogNormalize", scale.factor = 10000)
scRNA <- FindVariableFeatures(scRNA, selection.method = "vst", nfeatures = 2000)
scRNA <- ScaleData(scRNA, features = VariableFeatures(scRNA))
scRNA <- RunPCA(scRNA, features = VariableFeatures(scRNA))
pc.num=1:20
scRNA <- FindNeighbors(scRNA, dims = pc.num)
scRNA <- FindClusters(scRNA, resolution = c(0.01,0.05,0.1,0.2,0.5,0.7,0.9),
verbose = F)
table(scRNA$originalexp_snn_res.0.2)
norm_count = GetAssayData(scRNA, slot="data") #稀疏矩陣
dim(norm_count)
#[1] 18538 1715
norm_count[1:4,1:4]
# 4 x 4 sparse Matrix of class "dgCMatrix"
# 1772122_301_C02 1772122_180_E05 1772122_300_H02 1772122_180_B09
# WASH7P-p1 1.1275620 . 0.485604 .
# LINC01002-loc4 . . . .
# LOC100133331-loc1 0.7149377 0.5823631 . .
# LOC100132287-loc2 . . . .
- 已經(jīng)完成細胞類型注釋的參考數(shù)據(jù)集
ref
沼本;
library(celldex)
ref = HumanPrimaryCellAtlasData()
#ref = readRDS("C:/Users/xiaoxin/Desktop/生信數(shù)據(jù)/celldex/HumanPrimaryCellAtlasDatar.rds")
2噩峦、SingleR注釋
根據(jù)注釋方法以及參考數(shù)據(jù)集可分為如下幾種情況
- (1)參考數(shù)據(jù)集為Bulk RNA-seq/microarray來源的
SummarizedExperiment
對象,為每個細胞進行單獨注釋
pred<- SingleR(test = norm_count,
ref = ref,
labels = ref$label.main)
head(pred)
table(pred$labels)
# Astrocyte Chondrocytes Embryonic_stem_cells iPS_cells
# 32 1 127 195
# Neuroepithelial_cell Neurons Smooth_muscle_cells
# 1030 325 5
#identical(rownames(pred),colnames(norm_count))
#TRUE
#將注釋結果添加到seurat對象里
scRNA$singleR_cell = pred$labels
table(scRNA$singleR_cell)
- (2)參考數(shù)據(jù)集為Bulk RNA-seq/microarray來源的
SummarizedExperiment
對象抽兆,以每個cluster為單位進行注釋
其實只需要添加clusters
參數(shù)即可识补,如下:
pred<- SingleR(test = norm_count,
ref = ref,
clusters = scRNA$originalexp_snn_res.0.2,
labels = ref$label.main)
head(pred)
table(pred$labels)
scRNA$singleR_cluster = pred$labels[match(scRNA$originalexp_snn_res.0.2,
rownames(pred))]
table(scRNA$singleR_cluster)
- (3)參考數(shù)據(jù)集為Bulk RNA-seq/microarray來源的自己構建的表達矩陣,為每個細胞進行單獨注釋
# 表達矩陣: 行名為基因名辫红,列名為細胞類型注釋
# if start from "counts"
# ref <- SummarizedExperiment(assays=list(counts=ref))
# ref <- scuttle::logNormCounts(ref)
# ref_logcount<- assay(ref, "logcounts")
ref_logcount[1:4,1:4]
pred<- SingleR(test = norm_count,
ref = ref_logcount,
labels = colnames(ref_logcount))
head(pred)
table(pred$labels)
- (4)參考數(shù)據(jù)集為scRNA-seq凭涂,為每個細胞進行單獨注釋
由于單細胞表達矩陣的特殊性(稀疏,大部分表達值為零)贴妻,所以需要選擇更適合的比較算法Wilcoxon ranked sum test
秩和檢驗切油。其它與上述一致
pred<- SingleR(test = test,
ref = ref,
labels = ref$label.main,
de.method="wilcox")
- (5)注釋score可視化
plotScoreHeatmap(pred)
plotDeltaDistribution(pred.grun, ncol = 3)
二、celldex參考數(shù)據(jù)包
-
celldex
數(shù)據(jù)包按人/鼠來分包含以下類型
- 下載數(shù)據(jù)集時名惩,使用同名函數(shù)即可
library(celldex)
ref = HumanPrimaryCellAtlasData()
- 因為國外數(shù)據(jù)原因澎胡,有時數(shù)據(jù)集下載不穩(wěn)定,可以在網(wǎng)絡良好情況時下載娩鹉,并保存為本地對象攻谁,方便下次使用
ref = readRDS("C:/Users/xiaoxin/Desktop/生信數(shù)據(jù)/celldex/HumanPrimaryCellAtlasDatar.rds")
assay(ref, "logcounts")[1:4,1:4]
unique(ref$label.main)
unique(ref$label.fine)
unique(ref$label.ont)