我們上次基于各種marker對(duì)1189個(gè)細(xì)胞進(jìn)行分類(lèi)严望,然而,僅基于marker對(duì)細(xì)胞進(jìn)行分類(lèi)可能是不精確的哪雕,特別是考慮到scRNA-seq數(shù)據(jù)的high dropout rate 去团。因此,再進(jìn)行t-SNE降維之前斟叼,作者又進(jìn)一步將細(xì)胞進(jìn)行分類(lèi)。
首先需要對(duì)不同病人樣本的效應(yīng)進(jìn)行回歸分析蜕该,以確保得到的聚類(lèi)結(jié)果不是患者樣本之間的異質(zhì)性導(dǎo)致的犁柜。然后選擇了在細(xì)胞間分散度高的基因表達(dá)子集(平均表達(dá)量>0.1),以增加簇的穩(wěn)健性堂淡。首先計(jì)算經(jīng)驗(yàn)離散值(empirical dispersion value )去擬合離散度-均值(dispersion parameter )的關(guān)系馋缅,為每個(gè)基因選擇離散度參數(shù)(dispersion-mean relationship )。
# 將得到的1189個(gè)細(xì)胞分類(lèi)結(jié)果更新到sceset_final中
colData(sceset_final)$cell_types_markers <- cell_types_simple
pd_norm <- colData(sceset_final)
#使用monocle包對(duì)細(xì)胞類(lèi)型進(jìn)行無(wú)監(jiān)督聚類(lèi)方法绢淀,monocle_unsup_clust_plots是包裝好的一個(gè)函數(shù)萤悴,
HSMM_clustering_ct <- monocle_unsup_clust_plots(sceset_obj = sceset_final, mat_to_cluster = mat_norm, anno_colors = anno_colors,
name_in_phenodata = "cluster_allregr_disp", disp_extra = 1, save_plots = 0,
path_plots = NULL, type_pats = "allpats", regress_pat = 1)
HSMM_clustering_ct$Cluster <- HSMM_clustering_ct$cluster_allregr_disp
table(HSMM_clustering_ct$Cluster)
> table(HSMM_clustering_ct$Cluster)
1 2 3 4 5 6 7
64 141 95 74 202 559 54
我們看到,此時(shí)可分為7簇皆的,跟文中的9簇不一樣覆履,作者這么解釋?zhuān)河捎趓educeDimension and clusterCells函數(shù)的更新,因此結(jié)果會(huì)有所不同费薄。
# due to changes in Monocle's functions (reduceDimension and clusterCells), the resulting clustering is slightly different from
# the original clustering from the paper. for reproducibility, we read in the original cluster assignment
為了更好的的進(jìn)行下面的學(xué)習(xí)硝全,我們就拿已經(jīng)處理好的original clustering跑下去看看。
#讀取已經(jīng)處理的的original clustering
original_clustering <- readRDS(file="data/original_clustering.RDS")
HSMM_clustering_ct$Cluster <- original_clustering
table(HSMM_clustering_ct$Cluster)
> table(HSMM_clustering_ct$Cluster)
1 2 3 4 5 6 7 8 9
52 18 156 197 144 119 129 328 46
這時(shí)候就有9簇細(xì)胞楞抡,接下來(lái)需要對(duì)每1簇細(xì)胞再進(jìn)行細(xì)分伟众。
# 將得到的9簇細(xì)胞分類(lèi)結(jié)果更新到sceset_final中
colData(sceset_final) <- cbind(colData(sceset_final), pData(HSMM_clustering_ct)[,c(96:104)])
colData(sceset_final)$cell_types_cl_all <- colData(sceset_final)$cell_types_markers
pd_norm <- colData(sceset_final)
#首先在每個(gè)簇中確定unknown and undecided cell types
cells_to_assign <- list()
cells_to_reassign <- list()
mean_exprs <- list()
mean_reassign_exprs <- list()
# cluster1: 只有52 epithelial cells
cluster_here <- 1
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
mean_exprs[[cluster_here]] <- NULL
mean_reassign_exprs[[cluster_here]] <- NULL
cells_to_assign[[cluster_here]] <- NULL
cells_to_reassign[[cluster_here]] <- NULL
#cluster2:確定clsters2的大致細(xì)胞類(lèi)型:3 epithelial, 13 macrophage, 1 undecided and 1 unknown cells
cluster_here <- 2
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
> table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
epithelial macrophage undecided unknown
3 13 1 1
#計(jì)算undecided and unknown cells的免疫標(biāo)志物的平均表達(dá)量
to_assign_cluster <- c("undecided", "unknown")
mean_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"), mat_expr = mat_norm,
cells_pos = intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% to_assign_cluster)),
epithelial_markers = epithelial_markers, immune_markers = immune_markers, other_markers = other_markers)
mean_exprs[[cluster_here]]
#對(duì)于undecided 和 unknown cell,如果免疫標(biāo)志物的平均表達(dá)量最高召廷,指定為巨噬細(xì)胞凳厢。
cells_to_assign[[cluster_here]] <- intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% to_assign_cluster))[which(apply(mean_exprs[[cluster_here]], 1, function(x){if (x[2] >= max(x)) return(1) else {return(0)}}) == 1)]
pd_norm$cell_types_cl_all[cells_to_assign[[cluster_here]]] <- "macrophage"
table(pd_norm$cell_types_cl_all[which(pd_norm$Cluster == cluster_here)])
#確定clusters2中3個(gè)上皮細(xì)胞(epithelial cells)免疫標(biāo)志物的平均表達(dá)水平
mean_reassign_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"), mat_expr = mat_norm,
cells_pos = intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% "epithelial")),
epithelial_markers = epithelial_markers, immune_markers = immune_markers, other_markers = other_markers)
#類(lèi)似地账胧,clusters2有3個(gè)上皮細(xì)胞(epithelial cells)免疫標(biāo)志物的平均表達(dá)最強(qiáng),也被指定為巨噬細(xì)胞先紫。
mean_reassign_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"), mat_expr = mat_norm,
cells_pos = intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% "epithelial")),
epithelial_markers = epithelial_markers, immune_markers = immune_markers, other_markers = other_markers)
cells_to_reassign[[cluster_here]] <- intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% "epithelial"))[which(apply(mean_reassign_exprs[[cluster_here]], 1, function(x){if (x[2] >= max(x)) return(1) else {return(0)}}) == 1)]
pd_norm$cell_types_cl_all[cells_to_reassign[[cluster_here]]] <- "macrophage"
#因此治泥,clusters2由18個(gè)巨噬細(xì)胞組成。
table(pd_norm$cell_types_cl_all[which(pd_norm$Cluster == cluster_here)])
## cluster3:46 epithelial, 86 stroma, 14 endothelial, 5 undecided and 5 unknown cells.
cluster_here <- 3
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
> table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
endothelial epithelial stroma undecided unknown
14 46 86 5 5
mean_exprs[[cluster_here]] <- NULL
mean_reassign_exprs[[cluster_here]] <- NULL
cells_to_assign[[cluster_here]] <- NULL
cells_to_reassign[[cluster_here]] <- NULL
# cluster 4: 將unknown and undecided cells進(jìn)一步細(xì)分
cluster_here <- 4
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
to_assign_cluster <- c("undecided", "unknown")
mean_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"), mat_expr = mat_norm,
cells_pos = intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% to_assign_cluster)),
epithelial_markers = epithelial_markers, immune_markers = immune_markers, other_markers = other_markers)
# clusters3中unknown and undecided cells共有5個(gè)細(xì)胞的epithelial markers的平均表達(dá)最強(qiáng)遮精,被指定為epithelial cells
cells_to_assign[[cluster_here]] <- intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% to_assign_cluster))[which(apply(mean_exprs[[cluster_here]], 1, function(x){if (x[1] >= max(x)) return(1) else {return(0)}}) == 1)]
pd_norm$cell_types_cl_all[cells_to_assign[[cluster_here]]] <- "epithelial"
table(pd_norm$cell_types_cl_all[which(pd_norm$Cluster == cluster_here)])
# 確定個(gè)stroma cell是否高表達(dá)epithelial marker居夹,若epithelial marker表達(dá)較高,則被指定為 epithelial cells
mean_reassign_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"), mat_expr = mat_norm,
cells_pos = intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% "stroma")),
epithelial_markers = epithelial_markers, immune_markers = immune_markers, other_markers = other_markers)
# only re-assign the stroma cells if their epithelial expression is higher
cells_to_reassign[[cluster_here]] <- intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% "stroma"))[which(apply(mean_reassign_exprs[[cluster_here]], 1, function(x){if (x[1] >= max(x)) return(1) else {return(0)}}) == 1)]
pd_norm$cell_types_cl_all[cells_to_reassign[[cluster_here]]] <- "epithelial"
table(pd_norm$cell_types_cl_all[which(pd_norm$Cluster == cluster_here)])
cluster 5:46 epithelial, 4 stroma, 52 T, 3 B, 11 undecided and 28 unknown cells
cluster_here <- 5
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
mean_exprs[[cluster_here]] <- NULL
mean_reassign_exprs[[cluster_here]] <- NULL
cells_to_assign[[cluster_here]] <- NULL
cells_to_reassign[[cluster_here]] <- NULL
接下來(lái)的第6仑鸥、7吮播、8和9簇細(xì)胞,按照同樣的辦法細(xì)分眼俊,代碼太長(zhǎng)了意狠,我就不貼上來(lái)了。在這里先小結(jié)一下疮胖,我們開(kāi)始看到的是1189個(gè)細(xì)胞环戈,然后將這些細(xì)胞用無(wú)聚類(lèi)監(jiān)督分析先分為9個(gè)簇,因?yàn)橄嗨频募?xì)胞更容易成為一個(gè)簇澎灸,但是簇里面的細(xì)胞類(lèi)型我們已經(jīng)基于文獻(xiàn)的markers分類(lèi)好了院塞,若簇里面有多種細(xì)胞類(lèi)型的細(xì)胞存在,那么我們需要進(jìn)一步鑒定他們跟簇里面那群最大比例的細(xì)胞是不是同一類(lèi)細(xì)胞(不然怎么解釋為何他們會(huì)聚集到了同一簇)性昭,還是說(shuō)有自己的獨(dú)特身份拦止?但是我們?cè)趺炊x那個(gè)最大比例的細(xì)胞,作者將這個(gè)比例定為80%糜颠,舉個(gè)例子汹族,在clusters4總共有197個(gè)細(xì)胞(185 epithelial,3 stroma, 4 undecided and 5 unknown cells)其兴,其中 epithelial細(xì)胞占了94%顶瞒,超過(guò)了80%的比例,那么其他12個(gè)細(xì)胞有可能“偽裝”了自己元旬,其真實(shí)身份有可能是epithelial細(xì)胞榴徐,不然怎么解釋他們跟epithelial細(xì)胞聚集到了一起呢?這時(shí)候就需要進(jìn)一步確定其他12個(gè)細(xì)胞epithelial markers 平均表達(dá)量是否高于其他的markers,是的話他們的“廬山真面目”就是epithelial細(xì)胞匀归,若沒(méi)有坑资,那么就是冤枉到它了,讓它保持原來(lái)的身份穆端。而對(duì)于clusters3( 46 epithelial, 86 stroma, 14 endothelial, 5 undecided and 5 unknown cells)來(lái)說(shuō)盐茎,沒(méi)有1種細(xì)胞的比例超過(guò)80%,因此他們都可以保持自己的身份徙赢。我感覺(jué)這個(gè)過(guò)程還是蠻有趣的字柠,在對(duì)核實(shí)細(xì)胞身份的同時(shí),其實(shí)就是不斷接近客觀世界的過(guò)程狡赐。今天這些內(nèi)容已經(jīng)蠻多了窑业,下一節(jié)我們?cè)倮^續(xù)學(xué)習(xí)。