開工了灾前,到了我們上班的時(shí)間了,剛開始孟辑,我們來分享一些新出現(xiàn)的方法哎甲,看看方法上有哪些更新蔫敲,我們雖然做不了潮流的引領(lǐng)者,但是要跟隨潮流炭玫,保證自己不被時(shí)代拋棄奈嘿,今天分享的文獻(xiàn)在Comparative analysis of cell-cell communication at single-cell resolution,里面對(duì)當(dāng)前細(xì)胞通訊的方法做了一定的總結(jié),我們借鑒一下吞加,并看看作者的方法有哪些可取之處裙犹。
目前的單細(xì)胞通訊的分析特點(diǎn)
- at the level of the cell type or cluster, discarding single-cell level information.(這個(gè)大家應(yīng)該都非常清楚了,通訊強(qiáng)度就是基因表達(dá)的平均值相乘)
然而真實(shí)的細(xì)胞通訊
- 細(xì)胞通訊 does not operate at the level of the group; rather, such interactions take place between individual cells(真正的細(xì)胞通訊是細(xì)胞之間榴鼎,而不是細(xì)胞類型群體之間伯诬,這也是空間轉(zhuǎn)錄組為什么分析空間臨近通訊的原因)。
如果我們要真正的研究細(xì)胞之間的通訊巫财,那么
- analyze interactions at the level of the single cell
- leverage the full information content contained within scRNA-seq data by looking at up- and down-stream cellular activity
- enable comparative analysis between conditions;
- robust to multiple experimental designs
不知道大家對(duì)單細(xì)胞做細(xì)胞通訊的方式有什么更好的建議呢盗似??平项?赫舒,我們來看看作者開發(fā)的方法,Scriabin闽瓢,an adaptable and computationally-efficient method for CCC analysis.Scriabin 通過結(jié)合收集的配體-受體相互作用數(shù)據(jù)庫(kù)接癌、下游細(xì)胞內(nèi)信號(hào)傳導(dǎo)模型、基于anchors的數(shù)據(jù)集集成和基因網(wǎng)絡(luò)分析扣讼,以單細(xì)胞分辨率剖析復(fù)雜的交流途徑缺猛,以在單細(xì)胞分辨率下恢復(fù)具有生物學(xué)意義的 CCC edges。
Scriabin的分析框架
- the cell-cell interaction matrix workflow, optimal for smaller datasets, analyzes communication methods used for each cell-cell pair in the dataset(細(xì)胞對(duì)分析通訊)椭符,CCC 的基本單位是表達(dá)配體的發(fā)送細(xì)胞 Ni荔燎,這些配體被接收細(xì)胞 Nj 表達(dá)的同源受體接收。 Scriabin 通過計(jì)算數(shù)據(jù)集中每對(duì)細(xì)胞對(duì)每個(gè)配體-受體對(duì)表達(dá)的幾何平均值销钝,在細(xì)胞-細(xì)胞相互作用矩陣 M 中編碼此信息.由于配體-受體相互作用是定向的有咨,Scriabin將每個(gè)細(xì)胞分別視為“發(fā)送者”(配體表達(dá))和“接收者”(受體表達(dá)),從而保持 CCC 網(wǎng)絡(luò)的定向性質(zhì)蒸健。 M 可以類似于基因表達(dá)矩陣進(jìn)行處理座享,并用于降維、聚類和差異分析
- the summarized interaction graph workflow, designed for large comparative analyses, identifies cell-cell pairs with different total communicative potential between samples(識(shí)別有意義的配受體對(duì))
- the interaction program discovery workflow, suitable for any dataset size, finds modules of co-expressed ligand-receptor pairs(第三點(diǎn)尤為重要)似忧。
具體的方法
Generation of cell-cell interaction matrix
將一對(duì)細(xì)胞之間的細(xì)胞-細(xì)胞相互作用向量定義為每個(gè)同源配體-受體對(duì)的表達(dá)值的幾何平均值渣叛。 形式上,sender cell Ni 和receiver cell Nj 之間的交互向量 V 由下式給出:
where ????,???? represent a cognate ligand-receptor pair.選擇將配體和受體表達(dá)值相乘橡娄,以便配體或受體表達(dá)的零值將導(dǎo)致相互作用向量的相應(yīng)索引為零诗箍。 此外,選擇取配體-受體表達(dá)值乘積的平方根挽唉,這樣高表達(dá)的配體-受體對(duì)不會(huì)不成比例地驅(qū)動(dòng)下游分析滤祖。This definition is equivalent to the geometric mean。細(xì)胞-細(xì)胞相互作用矩陣 M 是通過連接細(xì)胞-細(xì)胞相互作用向量來構(gòu)建的瓶籽。 線性回歸用于校正 M 因測(cè)序深度引起的變化匠童,其中 Ni、Nj 的總測(cè)序深度定義為 Ni 和 Ni 中唯一分子標(biāo)識(shí)符 (UMI) 的總和塑顺。 M 用作低維嵌入的輸入以進(jìn)行可視化汤求,并用作最近鄰圖的輸入以進(jìn)行基于圖的聚類。
矩陣的下游分析(就是Seurat的基本操作)
通訊programs的分析严拒,Interaction program discovery workflow
Iterative approximation of a ligand-receptor pair topological overlap matrix (TOM)
Identification and significance testing of interaction programs(層次聚類,WGCNA)
看一看示例代碼
安裝
devtools::install_github("BlishLab/scriabin", ref = "main")
single-dataset(單一數(shù)據(jù)樣本)
library(Seurat)
library(SeuratData)
library(scriabin)
library(tidyverse)
#####示例數(shù)據(jù)
install.packages("https://seurat.nygenome.org/src/contrib/panc8.SeuratData_3.0.2.tar.gz", repos = NULL, type = "source")
library(panc8.SeuratData)
panc8 <- LoadData("panc8")
####Analyze the indrop dataset for the interaction programs tutorial
panc_fl <- subset(panc8, cells = colnames(panc8)[panc8$tech=="fluidigmc1"])
panc_fl <- SCTransform(panc_fl, verbose = F) %>%
RunPCA(verbose = F) %>%
RunUMAP(dims = 1:30, verbose = F)
DimPlot(panc_fl, group.by = "celltype", label = T, repel = T) + NoLegend()
First we generate a cell-cell interaction matrix object on the alpha and beta cells in the dataset. Then we scale, find variable features, run PCA and UMAP, find neighbor graphs and clusters, treating this matrix just like we would a gene expression matrix
panc_fl_ccim <- GenerateCCIM(subset(panc_fl, cells = colnames(panc_fl)[panc_fl$celltype %in% c("alpha","beta")]))
panc_fl_ccim <- ScaleData(panc_fl_ccim) %>%
FindVariableFeatures() %>%
RunPCA() %>%
RunUMAP(dims = 1:10) %>%
FindNeighbors(dims = 1:10) %>%
FindClusters(resolution = 0.2)
p1 <- DimPlot(panc_fl_ccim, group.by = "sender_celltype")
p2 <- DimPlot(panc_fl_ccim, group.by = "receiver_celltype")
p1+p2
DimPlot(panc_fl_ccim, label = T, repel = T) + NoLegend()
cluster4_5.edges <- FindMarkers(panc_fl_ccim, ident.1 = "4", ident.2 = "5")
cluster4_5.edges %>% top_n(20,wt = abs(avg_log2FC))
CCIMFeaturePlot(panc_fl_ccim, seu = panc_fl,
features = c("EFNA1","NPNT"), type_plot = "sender")
multi-dataset
library(Seurat)
library(SeuratData)
library(scriabin)
scriabin::load_nichenet_database()
library(SeuratData)
InstallData("ifnb")
ifnb <- UpdateSeuratObject(LoadData("ifnb"))
ifnb <- PercentageFeatureSet(ifnb, pattern = "MT-", col.name = "percent.mt") %>%
SCTransform(ifnb, vars.to.regress = "percent.mt", verbose = F) %>%
RunPCA(ifnb, verbose = F) %>%
FindNeighbors(ifnb, dims = 1:30, verbose = F) %>%
FindClusters(ifnb, verbose = F) %>%
RunUMAP(ifnb, dims = 1:30, verbose = F)
DimPlot(ifnb, label = T, repel = T) + NoLegend()
DimPlot(ifnb, label = T, repel = T, group.by = "seurat_annotations") + NoLegend()
DimPlot(ifnb, group.by = "stim")
This dataset is composed of two sub-datasets: PBMCs stimulated with IFNB (STIM) or control (CTRL).
If we assume there is no batch effect between these two datasets, we can observe from these dimensionality reduction projections the intense transcriptional perturbation caused by this stimulation. This is one situation in which it is easy to see why clustering/subclustering for high resolution CCC analysis can be a problematic task: the degree of perturbation is so high that the monocytes from these two datasets will never cluster together. We need a way to align these subspaces together. Given a monocyte in CTRL, what is its molecular counterpart in STIM?
We take recent progress in dataset integration methodology to develop a high-resolution alignment process we call "binning", where we assign each cell a bin identity so that we maximize the similarity of the cells within a bin, and maximize the representation of all the samples we wish to compare in each bin. After bins are identified, they are tested for the significance of their connectivity.
Dataset binning
There are two workflow options for binning in terms of bin identification:
- Bin by coarse cell identities (recommended). If the user specifies coarse cell identities (eg. in a PBMC dataset, T/NK cells, B cells, myeloid cells), this can prevent anchors from being identified between cells that the user believes to be in non-overlapping cell categories. This can result in cleaner bins, but there must be enough cells in each coarse identity from each dataset to proceed properly. A downside is that there may be some small groups of cells that aren't related to major cell populations (eg. a small population of platelets in a PBMC dataset).
- Bin all cells. Without specifying coarse cell identities for binning, potentially spurious associations may form between cells (especially in lower quality samples).
Significance testing proceeds by a permutation test: comparing connectivity of the identified bin against randomly generated bins. There are two workflow options for binning in terms of generating random bins:
- Pick cells from the same cell type in each random bin (recommended). If the user supplies granular cell type identities, a random bin will be constructed with cells from the same cell type identity. The more granular the cell type calls supplied, the more rigorous the significance testing.
- Without supplying granular cell type identities, the dataset will be clustered and the cluster identities used for significance testing. Generating random bins across the entire dataset would result in very few non-significant bins identified.
Here we will define a coarse cell identity to use for bin identification, and then use the Seurat-provided annotations for significance testing.
Additionally, Scriabin allows different reductions to be used for neighbor graph construction. For example, the results from a reference-based sPCA can be used for binning if the cell type relationships in the reference are considered more informative.
ifnb$coarse_ident <- mapvalues(ifnb$seurat_annotations, from = unique(ifnb$seurat_annotations),
to = c("myeloid","myeloid","T/misc","T/misc",
"T/misc","T/misc","T/misc","B",
"B","myeloid","myeloid","T/misc","T/misc"))
ifnb <- BinDatasets(ifnb, split.by = "stim", dims = 1:30,
coarse_cell_types = "coarse_ident", sigtest_cell_types = "seurat_annotations")
ifnb_split <- SplitObject(ifnb, split.by = "stim")
sum_ig <- AssembleInteractionGraphs(ifnb, by = "prior", split.by = "stim")
ifnb_split <- pblapply(ifnb_split, function(x) {BuildPriorInteraction(x, correct.depth = T)})
ogig <- lapply(ifnb_split, function(x) {
as.matrix(x@graphs$prior_interaction)
})
interaction-programs
library(Seurat)
library(SeuratData)
library(scriabin)
library(tidyverse)
library(ComplexHeatmap)
library(cowplot)
install.packages("https://seurat.nygenome.org/src/contrib/panc8.SeuratData_3.0.2.tar.gz", repos = NULL, type = "source")
library(panc8.SeuratData)
panc8 <- LoadData("panc8")
panc_id <- subset(panc8, cells = colnames(panc8)[panc8$tech=="indrop"])
panc_id <- SCTransform(panc_id, verbose = F) %>%
RunPCA(verbose = F) %>%
RunUMAP(dims = 1:30, verbose = F)
DimPlot(panc_id, group.by = "celltype", label = T, repel = T) + NoLegend()
panc_ip <- InteractionPrograms(panc_id, iterate.threshold = 300)
#test for interaction program significance
panc_ip_sig <- InteractionProgramSignificance(panc_ip, n.replicate = 500)
#score cells by expression of interaction program
panc_id <- ScoreInteractionPrograms(panc_id, panc_ip_sig)
panc_id_ip_lig <- as.matrix(panc_id@meta.data %>%
select("celltype",
starts_with("ligands")) %>%
group_by(celltype) %>%
summarise_if(is.numeric, mean) %>% column_to_rownames("celltype"))
Heatmap(panc_id_ip_lig, show_column_names = F)
panc_id_ip_rec <- as.matrix(panc_id@meta.data %>%
select("celltype",
starts_with("receptors")) %>%
group_by(celltype) %>%
summarise_if(is.numeric, mean) %>% column_to_rownames("celltype"))
Heatmap(panc_id_ip_rec, show_column_names = F)
act_stellate_ip <- panc_id_ip_lig["activated_stellate",]
poi <- gsub("ligands_","",names(which(act_stellate_ip==max(act_stellate_ip))))
#Seurat's FeaturePlot has a nice option to blend expression of two features together on the same plot
p <- FeaturePlot(panc_id,
features = c(paste0("ligands_",poi),
paste0("receptors_",poi)),
blend = T, combine = F,
cols = c("grey90","purple","yellowgreen"), order = T)
p[[3]] + NoLegend()
DimPlot(panc_id, group.by = "celltype", label = T, repel = T) + NoLegend()
moi <- reshape2::melt(mod_df %>% dplyr::filter(name==poi) %>%
select("lr_pair",contains("connectivity"))) %>% arrange(-value)
moi$lr_pair <- factor(moi$lr_pair, levels = unique(moi$lr_pair))
ggplot(moi, aes(x = lr_pair, y = value, color = variable)) +
geom_point() + theme_cowplot() + ggpubr::rotate_x_text() + labs(x = NULL, y = "Intramodular\nconnectivity")
最后問一句扬绪,大家覺得這個(gè)方法改進(jìn)之處合理么?
生活很好裤唠,有你更好