學(xué)習(xí)一篇NC的單細(xì)胞文章（四）：Clustering

我們上次基于各種marker對(duì)1189個(gè)細(xì)胞進(jìn)行分類(lèi)严望，然而，僅基于marker對(duì)細(xì)胞進(jìn)行分類(lèi)可能是不精確的哪雕，特別是考慮到scRNA-seq數(shù)據(jù)的high dropout rate 去团。因此，再進(jìn)行t-SNE降維之前斟叼，作者又進(jìn)一步將細(xì)胞進(jìn)行分類(lèi)。
首先需要對(duì)不同病人樣本的效應(yīng)進(jìn)行回歸分析蜕该，以確保得到的聚類(lèi)結(jié)果不是患者樣本之間的異質(zhì)性導(dǎo)致的犁柜。然后選擇了在細(xì)胞間分散度高的基因表達(dá)子集(平均表達(dá)量>0.1)，以增加簇的穩(wěn)健性堂淡。首先計(jì)算經(jīng)驗(yàn)離散值（empirical dispersion value ）去擬合離散度-均值（dispersion parameter ）的關(guān)系馋缅，為每個(gè)基因選擇離散度參數(shù)（dispersion-mean relationship ）。

# 將得到的1189個(gè)細(xì)胞分類(lèi)結(jié)果更新到sceset_final中
colData(sceset_final)$cell_types_markers <- cell_types_simple
pd_norm <- colData(sceset_final)
#使用monocle包對(duì)細(xì)胞類(lèi)型進(jìn)行無(wú)監(jiān)督聚類(lèi)方法绢淀，monocle_unsup_clust_plots是包裝好的一個(gè)函數(shù)萤悴，
HSMM_clustering_ct <- monocle_unsup_clust_plots(sceset_obj = sceset_final, mat_to_cluster = mat_norm,                                                       anno_colors = anno_colors, 
                                                name_in_phenodata = "cluster_allregr_disp", disp_extra = 1,                                                 save_plots = 0,
                                                path_plots = NULL, type_pats = "allpats", regress_pat = 1)


HSMM_clustering_ct$Cluster <- HSMM_clustering_ct$cluster_allregr_disp
table(HSMM_clustering_ct$Cluster)
> table(HSMM_clustering_ct$Cluster)

  1   2   3   4   5   6   7 
 64 141  95  74 202 559  54

我們看到，此時(shí)可分為7簇皆的，跟文中的9簇不一樣覆履，作者這么解釋?zhuān)河捎趓educeDimension and clusterCells函數(shù)的更新，因此結(jié)果會(huì)有所不同费薄。

# due to changes in Monocle's functions (reduceDimension and clusterCells), the resulting clustering is slightly different from
# the original clustering from the paper. for reproducibility, we read in the original cluster assignment

為了更好的的進(jìn)行下面的學(xué)習(xí)硝全，我們就拿已經(jīng)處理好的original clustering跑下去看看。

#讀取已經(jīng)處理的的original clustering
original_clustering <- readRDS(file="data/original_clustering.RDS")
HSMM_clustering_ct$Cluster <- original_clustering
table(HSMM_clustering_ct$Cluster)
> table(HSMM_clustering_ct$Cluster)

  1   2   3   4   5   6   7   8   9 
 52  18 156 197 144 119 129 328  46

這時(shí)候就有9簇細(xì)胞楞抡，接下來(lái)需要對(duì)每1簇細(xì)胞再進(jìn)行細(xì)分伟众。

# 將得到的9簇細(xì)胞分類(lèi)結(jié)果更新到sceset_final中
colData(sceset_final) <- cbind(colData(sceset_final), pData(HSMM_clustering_ct)[,c(96:104)])
colData(sceset_final)$cell_types_cl_all <- colData(sceset_final)$cell_types_markers
pd_norm <- colData(sceset_final)

#首先在每個(gè)簇中確定unknown and undecided cell types 
cells_to_assign <- list()
cells_to_reassign <- list()
mean_exprs <- list()
mean_reassign_exprs <- list()

# cluster1: 只有52 epithelial cells
cluster_here <- 1
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
mean_exprs[[cluster_here]] <- NULL
mean_reassign_exprs[[cluster_here]] <- NULL
cells_to_assign[[cluster_here]] <- NULL
cells_to_reassign[[cluster_here]] <- NULL

#cluster2：確定clsters2的大致細(xì)胞類(lèi)型：3 epithelial, 13 macrophage, 1 undecided and 1 unknown cells
cluster_here <- 2
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])

> table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
   epithelial macrophage  undecided    unknown 
         3         13          1          1 
#計(jì)算undecided and unknown cells的免疫標(biāo)志物的平均表達(dá)量
to_assign_cluster <- c("undecided", "unknown")
mean_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"), mat_expr = mat_norm, 
                                                      cells_pos = intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% to_assign_cluster)),
                                                      epithelial_markers = epithelial_markers, immune_markers = immune_markers, other_markers = other_markers)
mean_exprs[[cluster_here]]

#對(duì)于undecided 和 unknown cell，如果免疫標(biāo)志物的平均表達(dá)量最高召廷，指定為巨噬細(xì)胞凳厢。
cells_to_assign[[cluster_here]] <- intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% to_assign_cluster))[which(apply(mean_exprs[[cluster_here]], 1, function(x){if (x[2] >= max(x)) return(1) else {return(0)}}) == 1)]
pd_norm$cell_types_cl_all[cells_to_assign[[cluster_here]]] <- "macrophage"
table(pd_norm$cell_types_cl_all[which(pd_norm$Cluster == cluster_here)])

#確定clusters2中3個(gè)上皮細(xì)胞（epithelial cells）免疫標(biāo)志物的平均表達(dá)水平
mean_reassign_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"), mat_expr = mat_norm, 
                                                               cells_pos = intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% "epithelial")),
                                                               epithelial_markers = epithelial_markers, immune_markers = immune_markers, other_markers = other_markers)

#類(lèi)似地账胧，clusters2有3個(gè)上皮細(xì)胞（epithelial cells）免疫標(biāo)志物的平均表達(dá)最強(qiáng)，也被指定為巨噬細(xì)胞先紫。
mean_reassign_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"),                                                                  mat_expr = mat_norm, 
                                                               cells_pos = intersect(which(pd_norm$Cluster                                      == cluster_here), which(pd_norm$cell_types_markers %in% "epithelial")),
                                                               epithelial_markers = epithelial_markers,                                        immune_markers = immune_markers, other_markers = other_markers)
cells_to_reassign[[cluster_here]] <- intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% "epithelial"))[which(apply(mean_reassign_exprs[[cluster_here]], 1, function(x){if (x[2] >= max(x)) return(1) else {return(0)}}) == 1)]
pd_norm$cell_types_cl_all[cells_to_reassign[[cluster_here]]] <- "macrophage"
#因此治泥，clusters2由18個(gè)巨噬細(xì)胞組成。
table(pd_norm$cell_types_cl_all[which(pd_norm$Cluster == cluster_here)])

## cluster3：46 epithelial, 86 stroma, 14 endothelial, 5 undecided and 5 unknown cells.

cluster_here <- 3
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
> table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])

endothelial  epithelial      stroma   undecided     unknown 
         14          46          86           5           5 
         
mean_exprs[[cluster_here]] <- NULL
mean_reassign_exprs[[cluster_here]] <- NULL
cells_to_assign[[cluster_here]] <- NULL
cells_to_reassign[[cluster_here]] <- NULL


# cluster 4: 將unknown and undecided cells進(jìn)一步細(xì)分 
cluster_here <- 4
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
to_assign_cluster <- c("undecided", "unknown")
mean_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"), mat_expr = mat_norm, 
                                                      cells_pos = intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% to_assign_cluster)),
                                                      epithelial_markers = epithelial_markers, immune_markers = immune_markers, other_markers = other_markers)
# clusters3中unknown and undecided cells共有5個(gè)細(xì)胞的epithelial markers的平均表達(dá)最強(qiáng)遮精，被指定為epithelial cells

cells_to_assign[[cluster_here]] <- intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% to_assign_cluster))[which(apply(mean_exprs[[cluster_here]], 1, function(x){if (x[1] >= max(x)) return(1) else {return(0)}}) == 1)]
pd_norm$cell_types_cl_all[cells_to_assign[[cluster_here]]] <- "epithelial"
table(pd_norm$cell_types_cl_all[which(pd_norm$Cluster == cluster_here)])

# 確定個(gè)stroma cell是否高表達(dá)epithelial marker居夹，若epithelial marker表達(dá)較高，則被指定為 epithelial cells
mean_reassign_exprs[[cluster_here]] <- compute_mean_expr_types(types = c("epithelial", "immune", "other"), mat_expr = mat_norm, 
                                                               cells_pos = intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% "stroma")),
                                                               epithelial_markers = epithelial_markers, immune_markers = immune_markers, other_markers = other_markers)
# only re-assign the stroma cells if their epithelial expression is higher
cells_to_reassign[[cluster_here]] <- intersect(which(pd_norm$Cluster == cluster_here), which(pd_norm$cell_types_markers %in% "stroma"))[which(apply(mean_reassign_exprs[[cluster_here]], 1, function(x){if (x[1] >= max(x)) return(1) else {return(0)}}) == 1)]
pd_norm$cell_types_cl_all[cells_to_reassign[[cluster_here]]] <- "epithelial"
table(pd_norm$cell_types_cl_all[which(pd_norm$Cluster == cluster_here)])

cluster 5:46 epithelial, 4 stroma, 52 T, 3 B, 11 undecided and 28 unknown cells
cluster_here <- 5
table(pd_norm$cell_types_markers[which(pd_norm$Cluster == cluster_here)])
mean_exprs[[cluster_here]] <- NULL
mean_reassign_exprs[[cluster_here]] <- NULL
cells_to_assign[[cluster_here]] <- NULL
cells_to_reassign[[cluster_here]] <- NULL

接下來(lái)的第6仑鸥、7吮播、8和9簇細(xì)胞，按照同樣的辦法細(xì)分眼俊，代碼太長(zhǎng)了意狠，我就不貼上來(lái)了。在這里先小結(jié)一下疮胖，我們開(kāi)始看到的是1189個(gè)細(xì)胞环戈，然后將這些細(xì)胞用無(wú)聚類(lèi)監(jiān)督分析先分為9個(gè)簇，因?yàn)橄嗨频募?xì)胞更容易成為一個(gè)簇澎灸，但是簇里面的細(xì)胞類(lèi)型我們已經(jīng)基于文獻(xiàn)的markers分類(lèi)好了院塞，若簇里面有多種細(xì)胞類(lèi)型的細(xì)胞存在，那么我們需要進(jìn)一步鑒定他們跟簇里面那群最大比例的細(xì)胞是不是同一類(lèi)細(xì)胞（不然怎么解釋為何他們會(huì)聚集到了同一簇）性昭，還是說(shuō)有自己的獨(dú)特身份拦止？但是我們?cè)趺炊x那個(gè)最大比例的細(xì)胞，作者將這個(gè)比例定為80%糜颠，舉個(gè)例子汹族，在clusters4總共有197個(gè)細(xì)胞（185 epithelial，3 stroma, 4 undecided and 5 unknown cells）其兴，其中 epithelial細(xì)胞占了94%顶瞒，超過(guò)了80%的比例，那么其他12個(gè)細(xì)胞有可能“偽裝”了自己元旬，其真實(shí)身份有可能是epithelial細(xì)胞榴徐，不然怎么解釋他們跟epithelial細(xì)胞聚集到了一起呢？這時(shí)候就需要進(jìn)一步確定其他12個(gè)細(xì)胞epithelial markers 平均表達(dá)量是否高于其他的markers,是的話他們的“廬山真面目”就是epithelial細(xì)胞匀归，若沒(méi)有坑资，那么就是冤枉到它了，讓它保持原來(lái)的身份穆端。而對(duì)于clusters3（ 46 epithelial, 86 stroma, 14 endothelial, 5 undecided and 5 unknown cells）來(lái)說(shuō)盐茎，沒(méi)有1種細(xì)胞的比例超過(guò)80%，因此他們都可以保持自己的身份徙赢。我感覺(jué)這個(gè)過(guò)程還是蠻有趣的字柠，在對(duì)核實(shí)細(xì)胞身份的同時(shí)，其實(shí)就是不斷接近客觀世界的過(guò)程狡赐。今天這些內(nèi)容已經(jīng)蠻多了窑业，下一節(jié)我們?cè)倮^續(xù)學(xué)習(xí)。

最后編輯于：2021.02.08 19:47:40

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末枕屉，一起剝皮案震驚了整個(gè)濱河市常柄，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌搀擂，老刑警劉巖西潘，帶你破解...
沈念sama閱讀 218,122評(píng)論 6贊 505
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異哨颂，居然都是意外死亡喷市，警方通過(guò)查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,070評(píng)論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門(mén)威恼，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)品姓，“玉大人，你說(shuō)我怎么就攤上這事箫措「贡福” “怎么了？”我有些...
開(kāi)封第一講書(shū)人閱讀 164,491評(píng)論 0贊 354
道士緝兇錄：失蹤的賣(mài)姜人
文/不壞的土叔我叫張陵斤蔓，是天一觀的道長(zhǎng)植酥。經(jīng)常有香客問(wèn)我，道長(zhǎng)弦牡，這世上最難降的妖魔是什么友驮？我笑而不...
開(kāi)封第一講書(shū)人閱讀 58,636評(píng)論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮喇伯，結(jié)果婚禮上喊儡，老公的妹妹穿的比我還像新娘。我一直安慰自己稻据，他們只是感情好艾猜，可當(dāng)我...
茶點(diǎn)故事閱讀 67,676評(píng)論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布。她就那樣靜靜地躺著捻悯，像睡著了一般匆赃。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上今缚，一...
開(kāi)封第一講書(shū)人閱讀 51,541評(píng)論 1贊 305
城市分裂傳說(shuō)
那天算柳，我揣著相機(jī)與錄音，去河邊找鬼姓言。笑死瞬项，一個(gè)胖子當(dāng)著我的面吹牛蔗蹋，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播囱淋，決...
沈念sama閱讀 40,292評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼猪杭，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼！你這毒婦竟也來(lái)了妥衣？” 一聲冷哼從身側(cè)響起皂吮，我...
開(kāi)封第一講書(shū)人閱讀 39,211評(píng)論 0贊 276
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎税手，沒(méi)想到半個(gè)月后蜂筹，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,655評(píng)論 1贊 314
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡芦倒，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,846評(píng)論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年艺挪，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片熙暴。...
茶點(diǎn)故事閱讀 39,965評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡闺属，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出周霉，到底是詐尸還是另有隱情掂器，我是刑警寧澤，帶...
沈念sama閱讀 35,684評(píng)論 5贊 347
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布俱箱，位于F島的核電站国瓮，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏狞谱。R本人自食惡果不足惜乃摹，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,295評(píng)論 3贊 329
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望跟衅。院中可真熱鬧孵睬，春花似錦、人聲如沸伶跷。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 31,894評(píng)論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)叭莫。三九已至蹈集，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間雇初，已是汗流浹背拢肆。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 33,012評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人郭怪。一個(gè)月前我還...
沈念sama閱讀 48,126評(píng)論 3贊 370
代替公主和親
正文我出身青樓支示，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親移盆。傳聞我的和親對(duì)象是個(gè)殘疾皇子悼院，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,914評(píng)論 2贊 355

學(xué)習(xí)一篇NC的單細(xì)胞文章（四）：Clustering

推薦閱讀更多精彩內(nèi)容