學(xué)習(xí)一篇NC的單細(xì)胞文章(五):tSNE

我們昨日進(jìn)行clustering之后逗扒,將1107個細(xì)胞分成了9個簇,今天學(xué)習(xí)tsne方面的知識。

##將unknown and undecided cells去除掉
unkund <- which(pd_norm$cell_types_cl_all %in% c("undecided", "unknown"))

#將已經(jīng)再次細(xì)分好的細(xì)胞信息添加到sceset_final中吁伺,便于后續(xù)的分析
sceset_ct <- sceset_final[,-unkund]
pd_ct <- colData(sceset_ct)
mat_ct <- assays(sceset_ct)$exprs
mats_ct <- list()
pds_ct <- list()
for (i in 1:length(patients_now)) {
  mats_ct[[i]] <- mat_ct[,pd_ct$patient == patients_now[i]]
  pds_ct[[i]] <- pd_ct[pd_ct$patient == patients_now[i],]
}
names(mats_ct) <- patients_now
names(pds_ct) <- patients_now

#畫6個樣本的1107個細(xì)胞的細(xì)胞類型比例分布柱狀圖
match_celltype_levels <- c("epithelial", "stroma", "endothelial", "Tcell", "Bcell", "macrophage")
#將pd_ct轉(zhuǎn)換為 tibble 類型
tbl_pd_ct <- tbl_df(pd_ct)
tbl_pd_ct <- tbl_pd_ct %>%
  group_by(patient) %>%
  mutate(cell_types_cl_all = factor(cell_types_cl_all, levels = match_celltype_levels)) %>%
  arrange(cell_types_cl_all)
  
#畫Fig1.c
ggplot() +
  geom_bar(data = tbl_pd_ct, aes(x = patient, fill = factor(cell_types_cl_all)), position = position_fill(reverse = TRUE)) +
  scale_fill_manual(values = anno_colors$tsne) +
  labs(fill = "cell type", y = "fraction of cells")

Fig1.c: 可以看到,病人樣本之間的細(xì)胞類型有很明顯的異質(zhì)性租谈。
Fig1.c
#先看不同病人的細(xì)胞周期比例分布情況
tbl_pd_cycle <- tbl_pd_ct %>%
  group_by(patient) %>%
  mutate(cycling_mel = factor(cycling_mel, levels = c("cycling", "non-cycling"))) %>%
  arrange(cycling_mel)
ggplot() +
  geom_bar(data = tbl_pd_cycle, aes(x = patient, fill = factor(cycling_mel)), position = position_fill(reverse = TRUE)) +
  scale_fill_manual(values = anno_colors$cycling) +
  labs(fill = "cycling status", y = "fraction of cells")
Fig1.d

Fig1.d:PT081樣本中篮奄,cycling細(xì)胞占比(>34% )最多。接著將細(xì)胞周期與細(xì)胞類型聯(lián)系在一起割去。

#epithelial細(xì)胞比例
for (i in 1:length(patients_now)) {
  percent_epith <- length(intersect_all(which(pd_ct$patient == patients_now[i]),  #取交集
                                        which(pd_ct$cell_types_cl_all == "epithelial"), 
                                        which(pd_ct$cycling_mel == "cycling")))/length(intersect_all(
                                          which(pd_ct$patient == patients_now[i]), 
                                          which(pd_ct$cell_types_cl_all == "epithelial")))*100
  #細(xì)胞周期比例
  percent_all <- length(intersect_all(which(pd_ct$patient == patients_now[i]), 
                                      which(pd_ct$cycling_mel == "cycling")))/length(which(pd_ct$patient == patients_now[i]))*100
  print(ggplot(as.data.frame(pds_ct[[i]]), aes(x = mel_scores_g1s, y = mel_scores_g2m)) +
          geom_rect(ggplot2::aes(xmin = median(pd_ct$mel_scores_g1s) + 2 * mad(pd_ct$mel_scores_g1s),#以G1期cycling score中位數(shù)加其2MAD數(shù)值作為鑒定cycling cell的分界線
                        xmax = Inf, #Inf:無窮大
                        ymin = -Inf,
                        ymax = Inf),
                    fill = "gainsboro", alpha = 0.05) +#定義顏色窟却、透明度
          geom_rect(aes(ymin = median(pd_ct$mel_scores_g2m) + 2 * mad(pd_ct$mel_scores_g2m),
                        ymax = Inf,
                        xmin = -Inf,
                        xmax = Inf),

                 fill = "gainsboro", alpha = 0.05) +   
          geom_point(aes(col = factor(cell_types_cl_all, levels = names(anno_colors$tsne)), 
                         shape = factor(cycling_mel)), size  = 5) +
          xlim(-0.15, 2) +
          ylim(-0.15, 2.8) +
          labs(col = "cell type", shape = "cycling", x = "G1S score", y = "G2M score", #注釋
               title = paste("patient ", patients_now[i], " (", round(percent_all), "% cycling cells)", sep = "")) + #round:四舍五入
          scale_color_manual(values = anno_colors$tsne))
}
Fig1.e

Fig1.e:對于PT126來說,大部分處于cycling的細(xì)胞都被鑒定為上皮細(xì)胞呻逆。接著開始進(jìn)行tSNE夸赫。

#先對1112個細(xì)胞進(jìn)行聚類
to_plot_ct <- unique(pd_ct$cell_types_cl_all)
#mat_ct是已經(jīng)處理好的1112個細(xì)胞和13280個基因德數(shù)據(jù)框,pd_ct是對應(yīng)的細(xì)胞和樣本的注釋信息咖城,pd_ct$cell_types_cl_all指每個細(xì)#胞對應(yīng)的細(xì)胞類型茬腿。
#which函數(shù)中cell_types_cl_all等于#to_plot_ct的位置,然后提取mat_ct的表達(dá)譜宜雀,重新生成矩陣mat_short_ct
mat_short_ct <- mat_ct[, which(pd_ct$cell_types_cl_all %in% to_plot_ct)] 
#同樣的切平,提取注釋信息pd_short_ct
pd_short_ct <- pd_ct[which(pd_ct$cell_types_cl_all %in% to_plot_ct), ]
#開始tsne
tsne_short_ct <- Rtsne(t(mat_short_ct), perplexity = 30)
#最終得到一個list,其中tsne_short_ct$Y存儲樂畫圖的信息辐董,給tsne_short_ct$Y適當(dāng)添加對應(yīng)的細(xì)胞類型等屬性
colnames(tsne_short_ct$Y) <- c("col1", "col2")
tsne_short_ct$Y <- as.data.frame(tsne_short_ct$Y)
tsne_short_ct$Y$cell_types_cl_all <- pd_short_ct$cell_types_cl_all
tsne_short_ct$Y$cell_types_markers <- pd_short_ct$cell_types_markers
tsne_short_ct$Y$patient <- pd_short_ct$patient
head(tsne_short_ct$Y)
#這樣子就得到了每個細(xì)胞的坐標(biāo)悴品,細(xì)胞類型,對應(yīng)的marker和病人標(biāo)本信息
> head(tsne_short_ct$Y)
       col1          col2 cell_types_cl_all cell_types_markers patient
1  16.12422 -20.896689773        epithelial         epithelial   PT089
2  15.71953 -20.986193941        epithelial         epithelial   PT089
3  14.95464 -20.609750926        epithelial         epithelial   PT089
4 -33.07589  -0.069714081        macrophage         macrophage   PT089
5 -32.66427  -0.007138231        macrophage         macrophage   PT089
6  15.48987 -21.197981561        epithelial         epithelial   PT089
#畫圖fig2a
ggplot(tsne_short_ct$Y, aes(x = col1, y = col2, color = factor(cell_types_cl_all, levels = names(anno_colors$tsne)), 
                            shape = patient)) + 
  geom_point(size = 4) + 
  scale_color_manual(values = anno_colors$tsne) +
  labs(col = "patient", x = "tSNE dimension 1", y = "tSNE dimension 2", shape = "patient")
fig2.a

Fig2a:使用tSNE聚類對所有患者的1189個細(xì)胞進(jìn)行分析,發(fā)現(xiàn)非上皮性細(xì)胞群和上皮細(xì)胞群之間有很大的分離苔严,非上皮性細(xì)胞群被很好地分離成不同的簇定枷,上皮細(xì)胞群則形成多個亞群。

#對上皮細(xì)胞群進(jìn)行tsne
to_plot_ct <- c("epithelial")
mat_short_ct <- mat_ct[, which(pd_ct$cell_types_cl_all %in% to_plot_ct)]
pd_short_ct <- pd_ct[which(pd_ct$cell_types_cl_all %in% to_plot_ct), ]
tsne_short_ct <- Rtsne(t(mat_short_ct), perplexity = 30)
colnames(tsne_short_ct$Y) <- c("col1", "col2")
tsne_short_ct$Y <- as.data.frame(tsne_short_ct$Y)
tsne_short_ct$Y$cell_types_cl_all <- pd_short_ct$cell_types_cl_all
tsne_short_ct$Y$cell_types_markers <- pd_short_ct$cell_types_markers
tsne_short_ct$Y$patient <- pd_short_ct$patient
#畫圖fig2c
ggplot(tsne_short_ct$Y, aes(x = col1, y = col2, color = factor(patient, levels = names(anno_colors$patient)), 
                            shape = cell_types_cl_all)) + 
  geom_point(size = 4) + 
  scale_color_manual(values = anno_colors$patient) +
  labs(col = "patient", x = "tSNE dimension 1", y = "tSNE dimension 2", shape = "cell type")
fig2c

Fig2c:上皮細(xì)胞群通常被分成多個特異性簇(特別在PT039和PT081中更明顯)邦蜜,6個樣本均有上皮細(xì)胞簇依鸥。

#對非上皮細(xì)胞群進(jìn)行tsne
to_plot_ct <- c("Bcell", "macrophage", "Tcell", "stroma", "endothelial")
mat_short_ct <- mat_ct[, which(pd_ct$cell_types_cl_all %in% to_plot_ct)]
pd_short_ct <- pd_ct[which(pd_ct$cell_types_cl_all %in% to_plot_ct), ]
tsne_short_ct <- Rtsne(t(mat_short_ct), perplexity = 30)
colnames(tsne_short_ct$Y) <- c("col1", "col2")
tsne_short_ct$Y <- as.data.frame(tsne_short_ct$Y)
tsne_short_ct$Y$cell_types_cl_all <- pd_short_ct$cell_types_cl_all
tsne_short_ct$Y$cell_types_markers <- pd_short_ct$cell_types_markers
tsne_short_ct$Y$patient <- pd_short_ct$patient

#畫圖fig2c
ggplot(tsne_short_ct$Y, aes(x = col1, y = col2, color = factor(cell_types_cl_all, levels = names(anno_colors$tsne)), 
                            shape = patient)) + 
  geom_point(size = 4) + 
  labs(col = "cell type", x = "tSNE dimension 1", y = "tSNE dimension 2") + 
  scale_color_manual(values = anno_colors$tsne)

fig2c
先前對黑色素瘤和膠質(zhì)母細(xì)胞瘤的單細(xì)胞分選研究顯示,惡性細(xì)胞主要按患者樣本進(jìn)行聚集悼沈,這也說明了不同病人腫瘤之的異質(zhì)性很顯著贱迟。但是有其他乳腺癌的單細(xì)胞分析結(jié)果卻與之相反,每個患者特有的和共有的惡性細(xì)胞可以存在一個簇中絮供,暗示了乳腺癌中既有特定的瘤內(nèi)異質(zhì)性亞群細(xì)胞存在衣吠,也“共享”著一部分細(xì)胞亞群,而后者與作者的結(jié)果是符合的壤靶。
接著使用Monocle包對上皮細(xì)胞群進(jìn)行clustering缚俏,并且對患者效應(yīng)進(jìn)行regressing out 。monocle_unsup_clust_plots是已經(jīng)包裝好的函數(shù)贮乳,這個函數(shù)采用了 2014Science上的?篇《Clustering by fast search and find of density peaks》文章的算法忧换,這篇文獻(xiàn)提供?種基于密度(density-based )的聚類方法,關(guān)于單細(xì)胞聚類法方法的選擇大家可以參考2017年發(fā)表在Molecular Aspects of Medicine上的文章《Identifying cell populations with scRNASeq 》向拆,生信技能樹已經(jīng)有對這篇文獻(xiàn)進(jìn)行過解讀亚茬。https://mp.weixin.qq.com/s/fWdeGfLPXlK8PsUmvOgw5g

## clustering of epithelial cells
HSMM_allepith_clustering <- monocle_unsup_clust_plots(sceset_obj = sceset_ct[,which(colData(sceset_ct)$cell_types_cl_all == "epithelial")], 
                                                      mat_to_cluster = mat_ct[,which(colData(sceset_ct)$cell_types_cl_all == "epithelial")], 
                                                      anno_colors = anno_colors, name_in_phenodata = "cluster_allepith_regr_disp", 
                                                      disp_extra = 1, save_plots = 0, path_plots = NULL, 
                                                      type_pats = "allpats", regress_pat = 1, use_known_colors = 1, use_only_known_celltypes = 1)
table(HSMM_allepith_clustering$Cluster)
> table(HSMM_allepith_clustering$Cluster)

  1   2   3   4 
 69 292 169 338 

結(jié)果是將868個上皮細(xì)胞分成了4個clustering,與原文不一樣浓恳。

作者是這么解釋的刹缝,由于Monocle包的函數(shù)reduceDimension and clusterCells有所改變,因此要想重現(xiàn)圖片颈将,接下來作者建議使用他們已經(jīng)整理好的原始數(shù)據(jù)梢夯。

# due to changes in Monocle's functions (reduceDimension and clusterCells), 
#the resulting clustering of epithelial cells is slightly different from the original clustering from the paper. for reproducibility 
#we read in the original clustering of epithelial cells
original_clustering_epithelial <- readRDS(file= "data/original_clustering_epithelial.RDS")
table(original_clustering_epithelial)

HSMM_allepith_clustering$Cluster <- original_clustering_epithelial
clustering_allepith <- HSMM_allepith_clustering$Cluster
#畫圖fig3a
plot_cell_clusters(HSMM_allepith_clustering, 1, 2, color = "Cluster", cell_size = 2) + 
  scale_color_manual(values = c("1" = "#ee204d", "2" = "#17806d", "3" = "#b2ec5d", "4" = "#cda4de", "5" = "#1974d2"))
#給6個樣本的868個上皮細(xì)胞標(biāo)記上對應(yīng)的clusters編號
clusterings_sep_allepith <- list()
for (i in patients_now) {
  clusterings_sep_allepith[[i]] <- clustering_allepith[which(HSMM_allepith_clustering$patient == i)]
  names(clusterings_sep_allepith[[i]]) <- colnames(HSMM_allepith_clustering)[which(HSMM_allepith_clustering$patient == i)]
}

#看看每個cluster的的cycling 和non-cycling細(xì)胞比例
tbl_pd_cluster <- tbl_df(pData(HSMM_allepith_clustering))
tbl_pd_cluster <- tbl_pd_cluster %>%
  group_by(Cluster) %>%
  mutate(cycling_mel = factor(cycling_mel, levels = c("cycling", "non-cycling"))) %>%
  arrange(cycling_mel)

#畫圖figS6
ggplot() +
  geom_bar(data = tbl_pd_cluster, aes(x = Cluster, fill = factor(cycling_mel)), position = position_fill(reverse = TRUE)) +
  scale_fill_manual(values = anno_colors$cycling) +
  labs(fill = "cycling status", y = "fraction of cells")
figS6
## 將每一個cluster與全部的cluster進(jìn)行差異分析
HSMM_for_DE <- HSMM_allepith_clustering
diff_test_res <- list()
#先進(jìn)行cluster1跟全部的cluster差異分析
HSMM_for_DE$allvs1 <- clustering_allepith
HSMM_for_DE$allvs1 <- as.numeric(HSMM_for_DE$allvs1)
HSMM_for_DE$allvs1[which(HSMM_for_DE$allvs1 != 1)] <- 2
#差異分析采用monocle的differentialGeneTest函數(shù)
diff_test_res$allvs1 <- differentialGeneTest(HSMM_for_DE, fullModelFormulaStr = "~allvs1", cores = 3) 
diff_test_res$allvs1 <- diff_test_res$allvs1[order(diff_test_res$allvs1$qval),]#qval排序
diff_test_res$allvs1 <- diff_test_res$allvs1[which(diff_test_res$allvs1$qval <= 0.1),] #挑選qval <= 0.1的基因
head(diff_test_res$allvs1[,1:5], n = 10)
> head(diff_test_res$allvs1[,1:5], n = 10)
        status           family         pval         qval ensembl_gene_id
HP          OK negbinomial.size 5.727834e-52 4.811381e-48 ENSG00000257017
PRG4        OK negbinomial.size 4.208455e-26 1.767551e-22 ENSG00000116690
PLA2G2A     OK negbinomial.size 1.305859e-24 3.656405e-21 ENSG00000188257
CFH         OK negbinomial.size 1.307494e-23 2.745737e-20 ENSG00000000971
EGFL6       OK negbinomial.size 1.660370e-21 2.789421e-18 ENSG00000198759
CCL2        OK negbinomial.size 1.830853e-20 2.197023e-17 ENSG00000108691
ENPP2       OK negbinomial.size 1.715218e-20 2.197023e-17 ENSG00000136960
VTN         OK negbinomial.size 3.867117e-19 4.060473e-16 ENSG00000109072
CLDN1       OK negbinomial.size 9.901388e-15 9.241296e-12 ENSG00000163347
BCHE        OK negbinomial.size 3.048404e-13 2.560659e-10 ENSG00000114200
#接著就是cluster2了
HSMM_for_DE$allvs2 <- clustering_allepith
HSMM_for_DE$allvs2 <- as.numeric(HSMM_for_DE$allvs2)
HSMM_for_DE$allvs2[which(HSMM_for_DE$allvs2 != 2)] <- 3
diff_test_res$allvs2 <- differentialGeneTest(HSMM_for_DE, fullModelFormulaStr = "~allvs2", cores = 3)
diff_test_res$allvs2 <- diff_test_res$allvs2[order(diff_test_res$allvs2$qval),]
diff_test_res$allvs2 <- diff_test_res$allvs2[which(diff_test_res$allvs2$qval <= 0.1),]
head(diff_test_res$allvs2[,1:5], n = 10)

接著就是cluster3,4和5,代碼我不放上來了晴圾,下一節(jié)我們繼續(xù)學(xué)習(xí)颂砸。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市疑务,隨后出現(xiàn)的幾起案子沾凄,更是在濱河造成了極大的恐慌,老刑警劉巖知允,帶你破解...
    沈念sama閱讀 206,013評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件撒蟀,死亡現(xiàn)場離奇詭異求妹,居然都是意外死亡村砂,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評論 2 382
  • 文/潘曉璐 我一進(jìn)店門贤重,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人姑尺,你說我怎么就攤上這事竟终。” “怎么了切蟋?”我有些...
    開封第一講書人閱讀 152,370評論 0 342
  • 文/不壞的土叔 我叫張陵统捶,是天一觀的道長。 經(jīng)常有香客問我柄粹,道長喘鸟,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 55,168評論 1 278
  • 正文 為了忘掉前任驻右,我火速辦了婚禮什黑,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘堪夭。我一直安慰自己愕把,他們只是感情好,可當(dāng)我...
    茶點故事閱讀 64,153評論 5 371
  • 文/花漫 我一把揭開白布森爽。 她就那樣靜靜地躺著恨豁,像睡著了一般。 火紅的嫁衣襯著肌膚如雪爬迟。 梳的紋絲不亂的頭發(fā)上圣絮,一...
    開封第一講書人閱讀 48,954評論 1 283
  • 那天,我揣著相機與錄音雕旨,去河邊找鬼。 笑死捧请,一個胖子當(dāng)著我的面吹牛凡涩,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播疹蛉,決...
    沈念sama閱讀 38,271評論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼活箕,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了可款?” 一聲冷哼從身側(cè)響起育韩,我...
    開封第一講書人閱讀 36,916評論 0 259
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎闺鲸,沒想到半個月后筋讨,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,382評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡摸恍,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 35,877評論 2 323
  • 正文 我和宋清朗相戀三年悉罕,在試婚紗的時候發(fā)現(xiàn)自己被綠了赤屋。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 37,989評論 1 333
  • 序言:一個原本活蹦亂跳的男人離奇死亡壁袄,死狀恐怖类早,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情嗜逻,我是刑警寧澤涩僻,帶...
    沈念sama閱讀 33,624評論 4 322
  • 正文 年R本政府宣布,位于F島的核電站栈顷,受9級特大地震影響逆日,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜妨蛹,卻給世界環(huán)境...
    茶點故事閱讀 39,209評論 3 307
  • 文/蒙蒙 一屏富、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧蛙卤,春花似錦狠半、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,199評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至行嗤,卻和暖如春已日,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背栅屏。 一陣腳步聲響...
    開封第一講書人閱讀 31,418評論 1 260
  • 我被黑心中介騙來泰國打工飘千, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人栈雳。 一個月前我還...
    沈念sama閱讀 45,401評論 2 352
  • 正文 我出身青樓护奈,卻偏偏與公主長得像,于是被迫代替她去往敵國和親哥纫。 傳聞我的和親對象是個殘疾皇子霉旗,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 42,700評論 2 345