流程更新----空間細(xì)胞聚類及配受體共現(xiàn)分析(針對visium退疫、bin模式的Stereo-seq渠缕、以及HD)

作者,Evil Genius

今天我們分享一個(gè)簡答的內(nèi)容褒繁,空間細(xì)胞聚類與配受體共現(xiàn)亦鳞。

今年的空間課程給了大家一個(gè)方法,當(dāng)然了棒坏,都可以用燕差,也都有高分文章引用,我們今天更新一個(gè)方法俊抵,結(jié)果如下:

老粉應(yīng)該有印象分享的是哪篇文章谁不。

針對bin模式的Stereo-seq或者標(biāo)準(zhǔn)模式HD分析,不做圖像分割的情況下徽诲, 合并后的superspot都跟visium分析差不多刹帕,需要和單細(xì)胞數(shù)據(jù)一起進(jìn)行解卷積。當(dāng)然了谎替,這就會(huì)有課程上講到的分析偷溺,分子聚類、細(xì)胞聚類钱贯。

解卷積的方法么挫掏,一般都是cell2location、RCTD居多秩命,當(dāng)然了尉共,像CellTrek褒傅、CellScope等方法也都有人引用,分析完拿到空間細(xì)胞矩陣袄友,針對這個(gè)矩陣殿托,也會(huì)有很多的個(gè)性化分析。

我們更新一下這個(gè)空間細(xì)胞聚類的方法剧蚣。分析細(xì)胞類型的空間共現(xiàn)支竹。


簡單的例子

出處:https://github.com/ati-lz/ISCHIA

代碼示例

# Loading required packages
library(ISCHIA)
library(robustbase)
library(data.table)
library(ggplot2)
library(Seurat)
library(dplyr)
library(factoextra)
library(cluster)
library(showtext)
library(gridExtra)
library(pdftools)

# Set random seed for reproducibility
set.seed(123)

# Load data
pdac <- readRDS("/path/to/pdac_mets_rctd.rds")
assay_matrix <- pdac[["rctd_tier1"]]@data
norm_weights <- as.data.frame(t(assay_matrix))

# Elbow Method
k.values <- 1:20
wss_values <- sapply(k.values, function(k) kmeans(norm_weights, k, nstart = 10)$tot.withinss)

pdf("1_elbow_plot.pdf")
plot(k.values, wss_values, type = "b", pch = 19, frame = FALSE, 
     xlab = "Number of clusters K", ylab = "Total within-cluster sum of squares",
     main = "Elbow Method for Optimal K")
dev.off()

# Gap Statistic
gap_stat <- function(k) {
  km.res <- kmeans(norm_weights, k, nstart = 10)
  if (k == 1) return(NA)
  obs_disp <- sum(km.res$withinss)
  reference_disp <- mean(replicate(10, {
    km.null <- kmeans(matrix(rnorm(nrow(norm_weights) * ncol(norm_weights)), 
                             ncol = ncol(norm_weights)), k, nstart = 10)
    sum(km.null$withinss)
  }))
  log(reference_disp) - log(obs_disp)
}

gap_stat_values <- sapply(k.values, gap_stat)

pdf("2_gap_statistic_plot.pdf")
plot(k.values, gap_stat_values, type = "b", pch = 19, frame = FALSE, 
     xlab = "Number of Clusters (K)", ylab = "Gap Statistic",
     main = "Gap Statistic: Determining Optimal K")
dev.off()

# Calinski-Harabasz Index
calinski_harabasz_index <- function(data, labels) {
  num_clusters <- length(unique(labels))
  num_points <- nrow(data)
  centroids <- tapply(data, labels, FUN = colMeans)
  between_disp <- sum(sapply(1:num_clusters, function(i) {
    cluster_points <- data[labels == i, ]
    nrow(cluster_points) * sum((colMeans(cluster_points) - centroids[i, ]) ^ 2)
  }))
  within_disp <- sum(sapply(1:num_clusters, function(i) {
    cluster_points <- data[labels == i, ]
    sum((cluster_points - centroids[i, ]) ^ 2)
  }))
  (between_disp / (num_clusters - 1)) / (within_disp / (num_points - num_clusters))
}

ch_values <- sapply(k.values, function(k) {
  km.res <- kmeans(norm_weights, k, nstart = 10)
  calinski_harabasz_index(norm_weights, km.res$cluster)
})

pdf("3_calinski_harabasz_plot.pdf")
plot(k.values, ch_values, type = "b", pch = 19, frame = FALSE,
     xlab = "Number of Clusters (K)", ylab = "Calinski-Harabasz Index",
     main = "Calinski-Harabasz Index: Determining Optimal K")
dev.off()

# ISCHIA Analysis
pdf("4_composition_cluster_k_plot.pdf")
Composition.cluster.k(norm_weights, 20)
dev.off()

pdac <- Composition.cluster(pdac, norm_weights, 12)
pdac$cc_12 <- pdac$CompositionCluster_CC

# Spatial Dimension Plot
image_names <- c("IU_PDA_T1", "IU_PDA_T2", "IU_PDA_HM2", "IU_PDA_HM2_2", "IU_PDA_NP2", 
                 "IU_PDA_T3", "IU_PDA_HM3", "IU_PDA_T4", "IU_PDA_HM4", "IU_PDA_HM5", 
                 "IU_PDA_T6", "IU_PDA_HM6", "IU_PDA_LNM6", "IU_PDA_LNM7", "IU_PDA_T8", 
                 "IU_PDA_HM8", "IU_PDA_LNM8", "IU_PDA_T9", "IU_PDA_HM9", "IU_PDA_T10", 
                 "IU_PDA_HM10", "IU_PDA_LNM10", "IU_PDA_NP10", "IU_PDA_T11", "IU_PDA_HM11", 
                 "IU_PDA_NP11", "IU_PDA_T12", "IU_PDA_HM12", "IU_PDA_LNM12", "IU_PDA_HM13")

paletteMartin <- c('#e6194b', '#3cb44b', '#ffe119', '#4363d8', '#f58231', '#911eb4', 
                   '#46f0f0', '#f032e6', '#bcf60c', '#fabebe', '#008080', '#e6beff')

all_ccs <- unique(pdac$CompositionCluster_CC)
color_mapping <- setNames(paletteMartin[1:length(all_ccs)], all_ccs)

pdf("5_spatial_plots_K12.pdf", width = 10, height = 7)
for (image_name in image_names) {
  plot <- SpatialDimPlot(pdac, group.by = "CompositionCluster_CC", images = image_name) +
    scale_fill_manual(values = color_mapping) +
    theme_minimal() +
    ggtitle(image_name)
  print(plot)
}
dev.off()

# Enriched Cell Types
save_cc_plot <- function(cc) {
  plot <- Composition_cluster_enrichedCelltypes(pdac, cc, as.matrix(norm_weights))
  pdf_name <- paste0(cc, ".pdf")
  pdf(file = pdf_name)
  print(plot)
  dev.off()
}

ccs <- paste0("CC", 1:12)
for (cc in ccs) {
  save_cc_plot(cc)
}

pdf_files <- paste0("CC", 1:12, ".pdf")
pdf_combine(pdf_files, output = "6_enrichedCelltypes_CC_12.pdf")

# UMAP
pdac.umap <- Composition_cluster_umap(pdac, norm_weights)
pdf("7_umap_pie_chart.pdf")
print(pdac.umap$umap.deconv.gg)
dev.off()

# Add UMAP to Seurat object
emb.umap <- pdac.umap$umap.table
emb.umap$CompositionCluster_CC <- NULL
emb.umap$Slide <- NULL
emb.umap <- as.matrix(emb.umap)
colnames(emb.umap) <- c("UMAP1", "UMAP2")

pdac[['umap.ischia12']] <- CreateDimReducObject(embeddings = emb.umap, key = 'umap.ischia12_', assay = 'rctd_tier1')

pdf("8_seurat_ischia_umap_12.pdf")
DimPlot(pdac, reduction = "umap.ischia12", label = FALSE, group.by="cc_12")
dev.off()

# Bar plots
pdf("9_barplot_SampVsorig_12.pdf", height=12, width=20)
dittoBarPlot(pdac, "orig.ident", group.by = "cc_12")
dev.off()

pdf("10_barplot_origVsSamp_12.pdf", height=10, width=20)
dittoBarPlot(pdac, "cc_12", group.by = "orig.ident")
dev.off()

# Cell type co-occurrence
CC4.celltype.cooccur <- spatial.celltype.cooccurence(spatial.object=pdac, deconv.prob.mat=norm_weights, 
                                                     COI="CC4", prob.th= 0.05, 
                                                     Condition=unique(pdac$orig.ident))
pdf("11_celltype_cooccurrence_CC4.pdf")
plot.celltype.cooccurence(CC4.celltype.cooccur)
dev.off()


最后畫一畫這個(gè)圖

### This script performs the wilcoxon rank sum test and hierarchical clustering on the RCTD tier data to identify the significant abundant cell types between the clusters.
  
#### Load necessary packages
library(Seurat)
library(compositions)
library(tidyverse)
library(clustree)
library(patchwork)
library(uwot)
library(scran)
library(cluster)
library(ggrastr)
library(cowplot)
# library(conflicted) # to be loaded in case of a conflict arises.
 
config <- config::get()
 
# source(here::here("pdac_nac", "visualization", "eda.R"))
 
### Load the seurat object and get the proportions data
so <- readRDS(here::here(config$data_processed, "06-pdac_CC10_msig.rds"))
 
# Join with metadata if needed
metadata <- so@meta.data %>%
  select(orig.ident, patient_id, neoadjuvant_chemo, CompositionCluster_CC) %>%
  rownames_to_column("row_id")
 
 
# Get the proportions data
rctd_tier2 <- t(so@assays$rctd_tier2@data)
 
# Ensure the data is in the right format
rownames(rctd_tier2) <- make.unique(rownames(rctd_tier2))
 
# Log transformation of rctd_tier1
log_comps <- log10(rctd_tier2)
## Perform the summary statistics
We perform the summary statistics for the RCTD tier data. We perform hierarchical clustering and do the wilcoxon rank sum test to identify the differentially abundant cell types between the clusters. 
#### Prepare the data for the summary statistics
# Prepare data for summary statistics
cluster_summary_pat <- rctd_tier2 %>%
  as.data.frame() %>%
  rownames_to_column("row_id") %>%
  left_join(metadata, by = "row_id") %>%  # Join with meta_data using row_id as the key
  pivot_longer(-c(row_id, orig.ident, patient_id, neoadjuvant_chemo, CompositionCluster_CC), values_to = "ct_prop", names_to = "cell_type") %>%
  group_by(orig.ident, patient_id, neoadjuvant_chemo, CompositionCluster_CC, cell_type) %>%
  summarize(median_ct_prop = median(ct_prop, na.rm = TRUE))

# Aggregate data for median ct prop
cluster_summary <- cluster_summary_pat %>%
  ungroup() %>%
  group_by(CompositionCluster_CC, cell_type) %>%
  summarize(patient_median_ct_prop = median(median_ct_prop, na.rm = TRUE))
 
# Prepare matrix for hierarchical clustering
cluster_summary_mat <- cluster_summary %>%
  pivot_wider(values_from = patient_median_ct_prop, names_from = cell_type, values_fill = list(patient_median_ct_prop = 0)) %>%
  column_to_rownames("CompositionCluster_CC") %>%
  as.matrix()
 
# conflicts_prefer(stats::"dist") # resolve conflicts between %*% functions

# Perform hierarchical clustering
cluster_order <- hclust(dist(cluster_summary_mat))$labels[hclust(dist(cluster_summary_mat))$order] # use this if you want to order the clusters based on the hierarchical clustering
ct_order <- hclust(dist(t(cluster_summary_mat)))$labels[hclust(dist(t(cluster_summary_mat)))$order]
 
# Order Clusters in ascending order
# cluster_order1 <- c("CC1", "CC2", "CC3", "CC4", "CC5", "CC6", "CC7", "CC8", "CC9", "CC10")
 
# Wilcoxon test for characteristic cell types
run_wilcox_up <- function(prop_data) {
  prop_data_group <- prop_data[["CompositionCluster_CC"]] %>% unique() %>% set_names()
  map(prop_data_group, function(g) {
    test_data <- prop_data %>%
      mutate(test_group = ifelse(CompositionCluster_CC == g, "target", "rest")) %>%
      mutate(test_group = factor(test_group, levels = c("target", "rest")))
    wilcox.test(median_ct_prop ~ test_group, data = test_data, alternative = "greater") %>%
      broom::tidy()
  }) %>% enframe("CompositionCluster_CC") %>% unnest()
}

 
wilcoxon_res <- cluster_summary_pat %>%
  ungroup() %>%
  group_by(cell_type) %>%
  nest() %>%
  mutate(wres = map(data, run_wilcox_up)) %>%
  dplyr::select(wres) %>%
  unnest() %>%
  ungroup() %>%
  mutate(p_corr = p.adjust(p.value)) %>%
  mutate(significant = ifelse(p_corr <= 0.15, "*", ""))

 
#### Save the summary statistics

# give the path to save the summary statistics
file_path_cluster_summ <- here::here(config$data_interim,  "summary_of_clusters.txt")
file_path_wilcox_res <- here::here(config$data_interim, ? "wilcoxon_res_cells_clusters.txt")
 
# Save the summary statistics
write.table(cluster_summary_pat, file = file_path_cluster_summ, col.names = TRUE, row.names = FALSE, quote = FALSE, sep = "\t")
write.table(wilcoxon_res, file = file_path_wilcox_res, col.names = TRUE, row.names = FALSE, quote = FALSE, sep = "\t")

 
#### Plot the summary statistics
 
# Plotting mean ct prop and barplots
mean_ct_prop_plt <- cluster_summary %>%
  left_join(wilcoxon_res, by = c("CompositionCluster_CC", "cell_type")) %>%
  mutate(cell_type = factor(cell_type, levels = ct_order), CompositionCluster_CC = factor(CompositionCluster_CC, levels = cluster_order)) %>%
  ungroup() %>%
  group_by(cell_type) %>%
  mutate(scaled_pat_median = (patient_median_ct_prop - mean(patient_median_ct_prop)) / sd(patient_median_ct_prop)) %>%
  ungroup() %>%
  ggplot(aes(x = cell_type, y = CompositionCluster_CC, fill = scaled_pat_median)) +
  geom_tile(color = "black") +
  geom_text(aes(label = significant)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 12), legend.position = "bottom", plot.margin = unit(c(0, 0, 0, 0), "cm"), axis.text.y = element_text(size = 12)) +
  scale_fill_gradient2()
 
cluster_counts <- cluster_info %>%
  dplyr::select_at(c("row_id", "CompositionCluster_CC")) %>%
  group_by(CompositionCluster_CC) %>%
  summarize(nspots = length(CompositionCluster_CC)) %>%
  mutate(prop_spots = nspots / sum(nspots))
 
file_path_cluster_prop_summ <- here::here(config$data_interim, "cluster_prop_summary.csv")
 
write_csv(cluster_counts, file_path_cluster_prop_summ)
 
#barplots for cluster counts
barplts <- cluster_counts %>%
  mutate(CompositionCluster_CC = factor(CompositionCluster_CC, levels = cluster_order)) %>%
  ggplot(aes(y = CompositionCluster_CC, x = prop_spots)) +
  geom_bar(stat = "identity") +
  theme_classic() + ylab("") +
  theme(axis.text.y = element_blank(), plot.margin = unit(c(0, 0, 0, 0), "cm"), axis.text.x = element_text(size = 12))
 
cluster_summary_plt <- cowplot::plot_grid(mean_ct_prop_plt, barplts, align = "hv", axis = "tb") # use if barplots are needed to show the spot counts otherwise directly use mean_ct_prop_plt for the plot

 
#### plot the summary clusters
 
pdf_path_summ_clust <- here::here(config$plots, "wilcox_summary_clusters.pdf")
 
pdf(pdf_path_summ_clust, width = 20, height = 10)
plot(cluster_summary_plt)
dev.off()
 
#box plot for median ct prop
pdf_path_boxplot <- here::here(config$plots,  "wilcox_bboxplot_median_ct_prop.pdf")
 
plt <- cluster_summary_pat %>%
  ggplot(aes(x = CompositionCluster_CC, y = median_ct_prop)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  facet_wrap(. ~ cell_type, ncol = 3, scales = "free_y")
 
pdf(pdf_path_boxplot, width = 20, height = 10)
plot(plt)
dev.off()

生活很好,有你更好

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末鸠按,一起剝皮案震驚了整個(gè)濱河市礼搁,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌目尖,老刑警劉巖馒吴,帶你破解...
    沈念sama閱讀 206,013評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異卑雁,居然都是意外死亡募书,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評論 2 382
  • 文/潘曉璐 我一進(jìn)店門测蹲,熙熙樓的掌柜王于貴愁眉苦臉地迎上來莹捡,“玉大人,你說我怎么就攤上這事扣甲±河” “怎么了?”我有些...
    開封第一講書人閱讀 152,370評論 0 342
  • 文/不壞的土叔 我叫張陵琉挖,是天一觀的道長启泣。 經(jīng)常有香客問我,道長示辈,這世上最難降的妖魔是什么寥茫? 我笑而不...
    開封第一講書人閱讀 55,168評論 1 278
  • 正文 為了忘掉前任,我火速辦了婚禮矾麻,結(jié)果婚禮上纱耻,老公的妹妹穿的比我還像新娘。我一直安慰自己险耀,他們只是感情好弄喘,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,153評論 5 371
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著甩牺,像睡著了一般蘑志。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 48,954評論 1 283
  • 那天急但,我揣著相機(jī)與錄音澎媒,去河邊找鬼。 笑死羊始,一個(gè)胖子當(dāng)著我的面吹牛旱幼,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播突委,決...
    沈念sama閱讀 38,271評論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼冬三!你這毒婦竟也來了匀油?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 36,916評論 0 259
  • 序言:老撾萬榮一對情侶失蹤勾笆,失蹤者是張志新(化名)和其女友劉穎敌蚜,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體窝爪,經(jīng)...
    沈念sama閱讀 43,382評論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡弛车,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,877評論 2 323
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了蒲每。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片纷跛。...
    茶點(diǎn)故事閱讀 37,989評論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖邀杏,靈堂內(nèi)的尸體忽然破棺而出贫奠,到底是詐尸還是另有隱情,我是刑警寧澤望蜡,帶...
    沈念sama閱讀 33,624評論 4 322
  • 正文 年R本政府宣布唤崭,位于F島的核電站,受9級特大地震影響脖律,放射性物質(zhì)發(fā)生泄漏谢肾。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,209評論 3 307
  • 文/蒙蒙 一小泉、第九天 我趴在偏房一處隱蔽的房頂上張望芦疏。 院中可真熱鬧,春花似錦膏孟、人聲如沸眯分。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,199評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽弊决。三九已至,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間飘诗,已是汗流浹背与倡。 一陣腳步聲響...
    開封第一講書人閱讀 31,418評論 1 260
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留昆稿,地道東北人纺座。 一個(gè)月前我還...
    沈念sama閱讀 45,401評論 2 352
  • 正文 我出身青樓,卻偏偏與公主長得像溉潭,于是被迫代替她去往敵國和親净响。 傳聞我的和親對象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,700評論 2 345

推薦閱讀更多精彩內(nèi)容