常用的細(xì)胞通訊軟件:
- CellphoneDB:是公開的人工校正的礼搁,儲存受體击喂、配體以及兩種相互作用的數(shù)據(jù)庫黍氮。此外藕坯,還考慮了結(jié)構(gòu)組成夕凝,能夠描述異構(gòu)復(fù)合物柠偶。(配體-受體+多聚體)
- iTALK:通過平均表達(dá)量方式,篩選高表達(dá)的胚體和受體胎署,根據(jù)結(jié)果作圈圖吆录。(配體-受體)
- CellChat:CellChat將細(xì)胞的基因表達(dá)數(shù)據(jù)作為輸入,并結(jié)合配體受體及其輔助因子的相互作用來模擬細(xì)胞間通訊琼牧。(配體-受體+多聚體+輔因子)
- NicheNet // NicheNet多樣本分析 // 目標(biāo)基因的配體和靶基因活性預(yù)測:通過將相互作用細(xì)胞的表達(dá)數(shù)據(jù)與信號和基因調(diào)控網(wǎng)絡(luò)的先驗(yàn)知識相結(jié)合來預(yù)測相互作用細(xì)胞之間的配體-靶標(biāo)聯(lián)系的方法恢筝。( 配體-受體+信號通路)
附:NicheNet使用的常見問題匯總其它細(xì)胞互作軟件還包括
Celltalker
,SingleCellSignalR
障陶,scTensor
和SoptSC
(這幾個也是基于配體-受體相互作用)
NicheNet 在其分析單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)時滋恬,首先構(gòu)建了細(xì)胞內(nèi)配體聊训、受體抱究、信號蛋白、轉(zhuǎn)錄調(diào)控因子和靶基因互作信號網(wǎng)絡(luò)先驗(yàn)知識數(shù)據(jù)庫带斑,然后基于該數(shù)據(jù)庫鼓寺,通過基因表達(dá)模式的強(qiáng)相關(guān)性,最終確定了細(xì)胞類型之間的潛在通訊勋磕,并且能夠有效地預(yù)測配體活性妈候。
原理簡介
NicheNet是一種模擬細(xì)胞間相互作用如何影響細(xì)胞基因表達(dá)的計(jì)算方法。真實(shí)的細(xì)胞通訊過程是配體-受體-信號蛋白-轉(zhuǎn)錄調(diào)控因子-靶基因挂滓, 因此僅僅分析配受體之間的相互作用存在一定的局限性苦银。該方法的創(chuàng)新之處在于通過結(jié)合細(xì)胞表達(dá)數(shù)據(jù)、信號赶站、基因調(diào)控網(wǎng)絡(luò)的先驗(yàn)知識幔虏,來預(yù)測相互作用的細(xì)胞之間的配體-靶基因的連接情況。目前NicheNet可以處理人或者小鼠數(shù)據(jù)贝椿,以人類或小鼠的基因表達(dá)數(shù)據(jù)作為輸入想括,并將其與通過整合配體到目標(biāo)信號路徑構(gòu)建的先驗(yàn)?zāi)P瓦M(jìn)行結(jié)合。
NicheNet分析中依賴的先驗(yàn)?zāi)P屠硬饕菫榱吮碚鳜F(xiàn)有知識對配體可以調(diào)節(jié)目標(biāo)基因表達(dá)的支持程度瑟蜈,為了計(jì)算這種配體-靶標(biāo)調(diào)節(jié)潛力, NicheNet的先驗(yàn)?zāi)P椭胁粌H包括配體-受體的相互作用信息渣窜,還包括了細(xì)胞內(nèi)信號傳導(dǎo)信息铺根。因此,NicheNet除了預(yù)測哪些配體影響另一個細(xì)胞的基因表達(dá)外乔宿,還可以預(yù)測哪些靶基因受到這些配體的影響位迂,以及哪些信號介質(zhì)可能參與其中。
首先,NicheNet收集涵蓋配體-受體(例如囤官,KEGG數(shù)據(jù)庫)冬阳、細(xì)胞信號轉(zhuǎn)導(dǎo)(例如,蛋白質(zhì)-蛋白質(zhì)相互作用和激酶-底物相互作用)党饮、基因調(diào)控相互作用(例如肝陪,基于ChIP數(shù)據(jù)和motif的推斷)的公共數(shù)據(jù)信息;然后刑顺,將這些獨(dú)立的數(shù)據(jù)信息整合到兩個加權(quán)網(wǎng)絡(luò)中:(1)配體信號網(wǎng)絡(luò)氯窍,其中包含蛋白質(zhì) - 蛋白質(zhì)相互作用,涵蓋從配體到下游轉(zhuǎn)錄調(diào)節(jié)因子的信號通路蹲堂;(2) 基因調(diào)控網(wǎng)絡(luò)狼讨,其中包含轉(zhuǎn)錄調(diào)控因子和靶基因之間的基因調(diào)控相互作用。
為了讓信息數(shù)據(jù)源對最終模型做出更多貢獻(xiàn)柒竞,該軟件在集成過程中對每個數(shù)據(jù)源進(jìn)行了加權(quán)政供,這些數(shù)據(jù)源權(quán)重是通過基于模型的參數(shù)優(yōu)化自動確定的,以提高配體-目標(biāo)預(yù)測的準(zhǔn)確性朽基。
最后布隔,NicheNet結(jié)合配體信號傳導(dǎo)和基因調(diào)控網(wǎng)絡(luò)來計(jì)算所有配體和靶基因?qū)χg的調(diào)控潛力評分:如果靶基因的調(diào)節(jié)因子位于配體信號網(wǎng)絡(luò)的下游,則配體-靶標(biāo)對具有高調(diào)節(jié)潛力稼虎。為了計(jì)算這一評分衅檀,該方法在整合的網(wǎng)絡(luò)上使用網(wǎng)絡(luò)傳播方法來傳播來自配體的信號,從配體開始霎俩,流經(jīng)受體哀军、信號蛋白、轉(zhuǎn)錄調(diào)節(jié)因子打却,最終在目標(biāo)基因處結(jié)束杉适。
之前寫過NicheNet的標(biāo)準(zhǔn)分析pipeline,實(shí)際上做細(xì)胞互作分析的時候我們更多的還是在做樣本間的互作差異比較学密。平常我用CellChat比較多淘衙,但其實(shí)NicheNet也可以做多樣本互作比較,而且效果更好腻暮。
0. 讀入expression data of interest
, NicheNet ligand-receptor network
和 ligand-target matrix
加載所需要的包
library(nichenetr)
library(RColorBrewer)
library(tidyverse)
library(Seurat) #
讀入演示數(shù)據(jù)
seurat_obj = readRDS(url("https://zenodo.org/record/4675430/files/seurat_obj_hnscc.rds"))
p1=DimPlot(seurat_obj, group.by = "celltype") # user adaptation required on own dataset
p2=DimPlot(seurat_obj, group.by = "pEMT") # user adaptation required on own dataset
p1|p2
table(seurat_obj@meta.data$celltype, seurat_obj@meta.data$pEMT)
# High Low
# CAF 396 104
# Endothelial 105 53
# Malignant 1093 549
# Myeloid 92 7
# myofibroblast 382 61
# T.cell 689 3
這個演示里面比較的是pEMT-high-niche和pEMT-low-niche彤守,換成不同組都一樣的。
seurat_obj@meta.data$celltype_aggregate = paste(seurat_obj@meta.data$celltype, seurat_obj@meta.data$pEMT,sep = "_") # user adaptation required on own dataset
DimPlot(seurat_obj, group.by = "celltype_aggregate")
seurat_obj@meta.data$celltype_aggregate %>% table() %>% sort(decreasing = TRUE)
## .
## Malignant_High T.cell_High Malignant_Low CAF_High myofibroblast_High Endothelial_High
## 1093 689 549 396 382 105
## CAF_Low Myeloid_High myofibroblast_Low Endothelial_Low Myeloid_Low T.cell_Low
## 104 92 61 53 7 3
celltype_id = "celltype_aggregate" # metadata column name of the cell type of interest
seurat_obj = SetIdent(seurat_obj, value = seurat_obj[[celltype_id]])
讀入NicheNet受體靶基因矩陣
(里面的數(shù)值可以理解為親和力)和受體配體網(wǎng)絡(luò)
(受體配體是否結(jié)合)
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
## CXCL1 CXCL2 CXCL3 CXCL5 PPBP
## A1BG 3.534343e-04 4.041324e-04 3.729920e-04 3.080640e-04 2.628388e-04
## A1BG-AS1 1.650894e-04 1.509213e-04 1.583594e-04 1.317253e-04 1.231819e-04
## A1CF 5.787175e-04 4.596295e-04 3.895907e-04 3.293275e-04 3.211944e-04
## A2M 6.027058e-04 5.996617e-04 5.164365e-04 4.517236e-04 4.590521e-04
## A2M-AS1 8.898724e-05 8.243341e-05 7.484018e-05 4.912514e-05 5.120439e-05
lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))
lr_network = lr_network %>% mutate(bonafide = ! database %in% c("ppi_prediction","ppi_prediction_go"))
lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor, bonafide)
head(lr_network)
## # A tibble: 6 x 3
## ligand receptor bonafide
## <chr> <chr> <lgl>
## 1 CXCL1 CXCR2 TRUE
## 2 CXCL2 CXCR2 TRUE
## 3 CXCL3 CXCR2 TRUE
## 4 CXCL5 CXCR2 TRUE
## 5 PPBP CXCR2 TRUE
## 6 CXCL6 CXCR2 TRUE
table(lr_network$bonafide)
# FALSE TRUE
# 10629 1390
###?為什么這么多是false哭靖?
如果分析的是小鼠的數(shù)據(jù)具垫,需要先做一下基因的同源轉(zhuǎn)換
organism = "human" # user adaptation required on own dataset
if(organism == "mouse"){
lr_network = lr_network %>% mutate(ligand = convert_human_to_mouse_symbols(ligand), receptor = convert_human_to_mouse_symbols(receptor)) %>% drop_na()
colnames(ligand_target_matrix) = ligand_target_matrix %>% colnames() %>% convert_human_to_mouse_symbols()
rownames(ligand_target_matrix) = ligand_target_matrix %>% rownames() %>% convert_human_to_mouse_symbols()
ligand_target_matrix = ligand_target_matrix %>% .[!is.na(rownames(ligand_target_matrix)), !is.na(colnames(ligand_target_matrix))]
}
1. Define the niches/microenvironments of interest
每個niche應(yīng)該至少有一個“sender/niche”細(xì)胞群和一個“receiver/target”細(xì)胞群。
在這個演示數(shù)據(jù)集中试幽,我們想要去查看pEMT high和pEMT low的腫瘤組織中免疫細(xì)胞對腫瘤細(xì)胞的作用差異筝蚕。因此“Malignant_High”和“Malignant_Low”被定義為“receiver/target”細(xì)胞群,其它細(xì)胞被定義為“sender/niche”細(xì)胞群。注意:T.Cell和Myeloid細(xì)胞只有在pEMT-High樣本中才被定義為sender起宽,因?yàn)閜EMT-low樣本中這兩類細(xì)胞數(shù)目太少了洲胖。
??也就是說,NicheNet在做組間比較的時候坯沪,可以把condition-specific的細(xì)胞群考慮在內(nèi)绿映。(比較的是所有sender細(xì)胞的組間差異,而不是細(xì)胞特異性組間差異)
! Important: your receiver cell type should consist of 1 cluster!
(可以理解為:解析組織微環(huán)境如何決定某種細(xì)胞的行為)
niches = list(
"pEMT_High_niche" = list(
"sender" = c("myofibroblast_High", "Endothelial_High", "CAF_High", "T.cell_High", "Myeloid_High"),
"receiver" = c("Malignant_High")),
"pEMT_Low_niche" = list(
"sender" = c("myofibroblast_Low", "Endothelial_Low", "CAF_Low"),
"receiver" = c("Malignant_Low"))
) # user adaptation required on own dataset
2. Calculate differential expression between the niches (得到的是差異性受體配體矩陣
)
In this step, we will determine DE between the different niches for both senders and receivers to define the DE of L-R pairs.
2.1 計(jì)算DE
calculate_niche_de
Calculate differential expression of cell types in one niche versus all other niches of interest. This is possible for sender cell types and receiver cell types.
計(jì)算差異基因的方法默認(rèn)是Seurat Wilcoxon test(也可以使用其它方法)腐晾。
assay_oi = "SCT" # other possibilities: RNA,...
DE_sender = calculate_niche_de(seurat_obj = seurat_obj %>% subset(features = lr_network$ligand %>% unique()), niches = niches, type = "sender", assay_oi = assay_oi) # only ligands important for sender cell types
## [1] "Calculate Sender DE between: myofibroblast_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: myofibroblast_High and Endothelial_Low"
## [3] "Calculate Sender DE between: myofibroblast_High and CAF_Low"
## [1] "Calculate Sender DE between: Endothelial_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: Endothelial_High and Endothelial_Low"
## [3] "Calculate Sender DE between: Endothelial_High and CAF_Low"
## [1] "Calculate Sender DE between: CAF_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: CAF_High and Endothelial_Low"
## [3] "Calculate Sender DE between: CAF_High and CAF_Low"
## [1] "Calculate Sender DE between: T.cell_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: T.cell_High and Endothelial_Low"
## [3] "Calculate Sender DE between: T.cell_High and CAF_Low"
## [1] "Calculate Sender DE between: Myeloid_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: Myeloid_High and Endothelial_Low"
## [3] "Calculate Sender DE between: Myeloid_High and CAF_Low"
## [1] "Calculate Sender DE between: myofibroblast_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: myofibroblast_Low and Endothelial_High"
## [3] "Calculate Sender DE between: myofibroblast_Low and CAF_High"
## [4] "Calculate Sender DE between: myofibroblast_Low and T.cell_High"
## [5] "Calculate Sender DE between: myofibroblast_Low and Myeloid_High"
## [1] "Calculate Sender DE between: Endothelial_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: Endothelial_Low and Endothelial_High"
## [3] "Calculate Sender DE between: Endothelial_Low and CAF_High"
## [4] "Calculate Sender DE between: Endothelial_Low and T.cell_High"
## [5] "Calculate Sender DE between: Endothelial_Low and Myeloid_High"
## [1] "Calculate Sender DE between: CAF_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: CAF_Low and Endothelial_High"
## [3] "Calculate Sender DE between: CAF_Low and CAF_High"
## [4] "Calculate Sender DE between: CAF_Low and T.cell_High"
## [5] "Calculate Sender DE between: CAF_Low and Myeloid_High"
DE_receiver = calculate_niche_de(seurat_obj = seurat_obj %>% subset(features = lr_network$receptor %>% unique()), niches = niches, type = "receiver", assay_oi = assay_oi) # only receptors now, later on: DE analysis to find targets
## # A tibble: 1 x 2
## receiver receiver_other_niche
## <chr> <chr>
## 1 Malignant_High Malignant_Low
## [1] "Calculate receiver DE between: Malignant_High and Malignant_Low"
## [1] "Calculate receiver DE between: Malignant_Low and Malignant_High"
可以看到叉弦,它是先計(jì)算了Sender:high的5種sender細(xì)胞分別和low的3中sender細(xì)胞的Sender DE,又反過來計(jì)算了low的3中sender細(xì)胞分別和high的5種sender細(xì)胞的DE藻糖。
然后計(jì)算了Receiver:腫瘤細(xì)胞high-low的差異基因和low-high的差異基因淹冰。
這樣把細(xì)胞類型分開挨個計(jì)算而不是把所有sender和receiver細(xì)胞合并計(jì)算的意義是避免差異分析的結(jié)果主要被豐度高的細(xì)胞驅(qū)動。
正著計(jì)算一遍巨柒,反著計(jì)算一遍樱拴,a-->b上調(diào)的/下調(diào)的基因和b-->a下調(diào)的/上調(diào)的難道不是一樣的嗎?
隨后我去驗(yàn)證了一下潘拱,果然如此疹鳄。以receiver中的IL6為例
DE_sender = DE_sender %>% mutate(avg_log2FC = ifelse(avg_log2FC == Inf, max(avg_log2FC[is.finite(avg_log2FC)]), ifelse(avg_log2FC == -Inf, min(avg_log2FC[is.finite(avg_log2FC)]), avg_log2FC)))
DE_receiver = DE_receiver %>% mutate(avg_log2FC = ifelse(avg_log2FC == Inf, max(avg_log2FC[is.finite(avg_log2FC)]), ifelse(avg_log2FC == -Inf, min(avg_log2FC[is.finite(avg_log2FC)]), avg_log2FC)))
2.2 處理DE結(jié)果
expression_pct = 0.10
DE_sender_processed = process_niche_de(DE_table = DE_sender, niches = niches, expression_pct = expression_pct, type = "sender")
DE_receiver_processed = process_niche_de(DE_table = DE_receiver, niches = niches, expression_pct = expression_pct, type = "receiver")
以receiver中的IL6為例
process_niche_de
是對calculate_niche_de的結(jié)果做了上上圖中所標(biāo)的整合(紅色的整合為了五種pEMT_High_niche禽最,下面三種顏色分別整合為三種pEMT_Low_niche)
2.3 Combine sender-receiver DE based on L-R pairs:
如前所述,來自一種sender細(xì)胞的差異表達(dá)的配體是通過計(jì)算該樣品這種sender細(xì)胞和另一樣品中所有sender細(xì)胞得到的袱饭。因此我們有多種方法總結(jié)得到細(xì)胞類型的差異表達(dá)配體川无。我們可以使用average LFC,也可以使用minimum LFC虑乖。但是更推薦使用minimum LFC
懦趋。因?yàn)樗窃u估配體表達(dá)的最強(qiáng)的特異性指標(biāo),因?yàn)楦叩膍in LFC意味著和niche 2中的所有細(xì)胞類型相比疹味,這個配體在niche 1的這個細(xì)胞類型中表達(dá)更強(qiáng)(如果使用average LFC仅叫,則不能排除niche 2中一種或多種細(xì)胞也很強(qiáng)的表達(dá)這個配體)。
specificity_score_LR_pairs = "min_lfc"
DE_sender_receiver = combine_sender_receiver_de(DE_sender_processed, DE_receiver_processed, lr_network, specificity_score = specificity_score_LR_pairs)
table(DE_sender_receiver$sender)
# CAF_High CAF_Low Endothelial_High Endothelial_Low
# 8171 8171 8171 8171
# Myeloid_High myofibroblast_High myofibroblast_Low T.cell_High
# 8171 8171 8171 8171
table(DE_sender_receiver$receiver)
# Malignant_High Malignant_Low
# 40855 24513
View(DE_sender_receiver)
這一步主要得到了DE_sender_receiver
這個對象糙捺,也就是不同niche中的差異基因诫咱。
3. 計(jì)算空間互作差異(可選)
如果有空間轉(zhuǎn)錄組數(shù)據(jù),把下面兩行代碼設(shè)置成TRUE洪灯,沒有的話直接運(yùn)行下面的代碼即可坎缭。
include_spatial_info_sender = FALSE # if not spatial info to include: put this to false # user adaptation required on own dataset
include_spatial_info_receiver = FALSE # if spatial info to include: put this to true # user adaptation required on own dataset
spatial_info = tibble(celltype_region_oi = "CAF_High", celltype_other_region = "myofibroblast_High", niche = "pEMT_High_niche", celltype_type = "sender") # user adaptation required on own dataset
specificity_score_spatial = "lfc"
# this is how this should be defined if you don't have spatial info
# mock spatial info
if(include_spatial_info_sender == FALSE & include_spatial_info_receiver == FALSE){
spatial_info = tibble(celltype_region_oi = NA, celltype_other_region = NA) %>% mutate(niche = niches %>% names() %>% head(1), celltype_type = "sender")
}
if(include_spatial_info_sender == TRUE){
sender_spatial_DE = calculate_spatial_DE(seurat_obj = seurat_obj %>% subset(features = lr_network$ligand %>% unique()), spatial_info = spatial_info %>% filter(celltype_type == "sender"))
sender_spatial_DE_processed = process_spatial_de(DE_table = sender_spatial_DE, type = "sender", lr_network = lr_network, expression_pct = expression_pct, specificity_score = specificity_score_spatial)
# add a neutral spatial score for sender celltypes in which the spatial is not known / not of importance
sender_spatial_DE_others = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "sender", lr_network = lr_network)
sender_spatial_DE_processed = sender_spatial_DE_processed %>% bind_rows(sender_spatial_DE_others)
sender_spatial_DE_processed = sender_spatial_DE_processed %>% mutate(scaled_ligand_score_spatial = scale_quantile_adapted(ligand_score_spatial))
} else {
# # add a neutral spatial score for all sender celltypes (for none of them, spatial is relevant in this case)
sender_spatial_DE_processed = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "sender", lr_network = lr_network)
sender_spatial_DE_processed = sender_spatial_DE_processed %>% mutate(scaled_ligand_score_spatial = scale_quantile_adapted(ligand_score_spatial))
}
## [1] "Calculate Spatial DE between: CAF_High and myofibroblast_High"
if(include_spatial_info_receiver == TRUE){
receiver_spatial_DE = calculate_spatial_DE(seurat_obj = seurat_obj %>% subset(features = lr_network$receptor %>% unique()), spatial_info = spatial_info %>% filter(celltype_type == "receiver"))
receiver_spatial_DE_processed = process_spatial_de(DE_table = receiver_spatial_DE, type = "receiver", lr_network = lr_network, expression_pct = expression_pct, specificity_score = specificity_score_spatial)
# add a neutral spatial score for receiver celltypes in which the spatial is not known / not of importance
receiver_spatial_DE_others = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "receiver", lr_network = lr_network)
receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% bind_rows(receiver_spatial_DE_others)
receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% mutate(scaled_receptor_score_spatial = scale_quantile_adapted(receptor_score_spatial))
} else {
# # add a neutral spatial score for all receiver celltypes (for none of them, spatial is relevant in this case)
receiver_spatial_DE_processed = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "receiver", lr_network = lr_network)
receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% mutate(scaled_receptor_score_spatial = scale_quantile_adapted(receptor_score_spatial))
}
4. 計(jì)算配體活性,推斷active ligand-target links
在這一步中,我們將要預(yù)測不同niches中receiver細(xì)胞類型的每個配體的活性掏呼。(和常規(guī)NicheNet的配體活性分析類似)坏快。
??第2步是
受體 -- 配體
,這一步是配體--靶基因
憎夷。
為了計(jì)算配體活性假消,我們首先需要在每個niche中分別定義一個感興趣的基因集。在這個示例中岭接,pEMT-high的基因集是和pEMT-low腫瘤相比富拗,pEMT-high中的上調(diào)基因。pEMT-low的基因集則相反鸣戴。
lfc_cutoff = 0.15 # recommended for 10x as min_lfc cutoff.
specificity_score_targets = "min_lfc"
DE_receiver_targets = calculate_niche_de_targets(seurat_obj = seurat_obj, niches = niches, lfc_cutoff = lfc_cutoff, expression_pct = expression_pct, assay_oi = assay_oi)
## [1] "Calculate receiver DE between: Malignant_High and Malignant_Low"
## [1] "Calculate receiver DE between: Malignant_Low and Malignant_High"
DE_receiver_processed_targets = process_receiver_target_de(DE_receiver_targets = DE_receiver_targets, niches = niches, expression_pct = expression_pct, specificity_score = specificity_score_targets)
View(DE_receiver_processed_targets)
background = DE_receiver_processed_targets %>% pull(target) %>% unique()
geneset_niche1 = DE_receiver_processed_targets %>% filter(receiver == niches[[1]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
geneset_niche2 = DE_receiver_processed_targets %>% filter(receiver == niches[[2]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
# Good idea to check which genes will be left out of the ligand activity analysis (=when not present in the rownames of the ligand-target matrix).
# If many genes are left out, this might point to some issue in the gene naming (eg gene aliases and old gene symbols, bad human-mouse mapping)
geneset_niche1 %>% setdiff(rownames(ligand_target_matrix))
geneset_niche2 %>% setdiff(rownames(ligand_target_matrix))
length(geneset_niche1) #niche1中受體的target
## [1] 1668
length(geneset_niche2) #niche2中受體的target
## [1] 2889
在做配體活性分析之前啃沪,最好還是做一下基因集中的基因數(shù)的檢測。一般認(rèn)為對配體活性分析來說窄锅,感興趣的基因集中有20-1000基因是比較合適的创千。如果得到的DE基因數(shù)過多,推薦使用更高的lfc_cutoff
閾值入偷。在有>2的receiver細(xì)胞/niches時或追驴,我們建議使用0.15的cutoff值。如果只有2組receiver細(xì)胞/niches時疏之,我們建議使用更高的閾值(比如0.25)殿雪。如果是測序深度比較深的數(shù)據(jù)比如Smart-seq2,同樣建議使用更高的閾值锋爪。
在這個演示數(shù)據(jù)中丙曙,我們使用的是Smart-seq2的數(shù)據(jù),而且只有比較了2個niches其骄,所以我們使用高LFC閾值以得到更少的DE基因(更高的閾值得到的DE基因 更少亏镰,可信度更高)。
lfc_cutoff = 0.75
specificity_score_targets = "min_lfc"
DE_receiver_targets = calculate_niche_de_targets(seurat_obj = seurat_obj, niches = niches, lfc_cutoff = lfc_cutoff, expression_pct = expression_pct, assay_oi = assay_oi)
DE_receiver_processed_targets = process_receiver_target_de(DE_receiver_targets = DE_receiver_targets, niches = niches, expression_pct = expression_pct, specificity_score = specificity_score_targets)
background = DE_receiver_processed_targets %>% pull(target) %>% unique()
geneset_niche1 = DE_receiver_processed_targets %>% filter(receiver == niches[[1]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
geneset_niche2 = DE_receiver_processed_targets %>% filter(receiver == niches[[2]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
# Good idea to check which genes will be left out of the ligand activity analysis (=when not present in the rownames of the ligand-target matrix).
# If many genes are left out, this might point to some issue in the gene naming (eg gene aliases and old gene symbols, bad human-mouse mapping)
geneset_niche1 %>% setdiff(rownames(ligand_target_matrix))
## [1] "ANXA8L2" "PRKCDBP" "IL8" "PTRF" "SEPP1" "C1orf186"
geneset_niche2 %>% setdiff(rownames(ligand_target_matrix))
## [1] "LOC344887" "AGPAT9" "C1orf110" "KIAA1467" "LOC100292680" "EPT1" "CT45A4"
length(geneset_niche1)
## [1] 169
length(geneset_niche2)
## [1] 136
top_n_target = 250
niche_geneset_list = list(
"pEMT_High_niche" = list(
"receiver" = niches[[1]]$receiver,
"geneset" = geneset_niche1,
"background" = background),
"pEMT_Low_niche" = list(
"receiver" = niches[[2]]$receiver,
"geneset" = geneset_niche2 ,
"background" = background)
)
ligand_activities_targets = get_ligand_activities_targets(niche_geneset_list = niche_geneset_list, ligand_target_matrix = ligand_target_matrix, top_n_target = top_n_target)
## [1] "Calculate Ligand activities for: Malignant_High"
## [1] "Calculate Ligand activities for: Malignant_Low"
上面這一步經(jīng)常出現(xiàn)這個warning
5. 計(jì)算不同感興趣細(xì)胞群中的受體拯爽、配體和靶基因的(scaled)表達(dá) (log expression values and expression fractions)
在這一步中索抓,我們將會計(jì)算受體、配體和靶基因在不同細(xì)胞群中的平均(scaled)表達(dá)和表達(dá)fraction毯炮。這里是使用DotPlot展示的逼肯,也可以用其他方式展示。
features_oi = union(lr_network$ligand, lr_network$receptor) %>% union(ligand_activities_targets$target) %>% setdiff(NA)
# 上一步得到1597個feature
dotplot = suppressWarnings(Seurat::DotPlot(seurat_obj %>% subset(idents = niches %>% unlist() %>% unique()), features = features_oi, assay = assay_oi))
exprs_tbl = dotplot$data %>% as_tibble()
exprs_tbl = exprs_tbl %>% rename(celltype = id, gene = features.plot, expression = avg.exp, expression_scaled = avg.exp.scaled, fraction = pct.exp) %>%
mutate(fraction = fraction/100) %>% as_tibble() %>% select(celltype, gene, expression, expression_scaled, fraction) %>% distinct() %>% arrange(gene) %>% mutate(gene = as.character(gene))
exprs_tbl_ligand = exprs_tbl %>% filter(gene %in% lr_network$ligand) %>% rename(sender = celltype, ligand = gene, ligand_expression = expression, ligand_expression_scaled = expression_scaled, ligand_fraction = fraction)
exprs_tbl_receptor = exprs_tbl %>% filter(gene %in% lr_network$receptor) %>% rename(receiver = celltype, receptor = gene, receptor_expression = expression, receptor_expression_scaled = expression_scaled, receptor_fraction = fraction)
exprs_tbl_target = exprs_tbl %>% filter(gene %in% ligand_activities_targets$target) %>% rename(receiver = celltype, target = gene, target_expression = expression, target_expression_scaled = expression_scaled, target_fraction = fraction)
dotplot
exprs_tbl_ligand = exprs_tbl_ligand %>% mutate(scaled_ligand_expression_scaled = scale_quantile_adapted(ligand_expression_scaled)) %>% mutate(ligand_fraction_adapted = ligand_fraction) %>% mutate_cond(ligand_fraction >= expression_pct, ligand_fraction_adapted = expression_pct) %>% mutate(scaled_ligand_fraction_adapted = scale_quantile_adapted(ligand_fraction_adapted))
exprs_tbl_receptor = exprs_tbl_receptor %>% mutate(scaled_receptor_expression_scaled = scale_quantile_adapted(receptor_expression_scaled)) %>% mutate(receptor_fraction_adapted = receptor_fraction) %>% mutate_cond(receptor_fraction >= expression_pct, receptor_fraction_adapted = expression_pct) %>% mutate(scaled_receptor_fraction_adapted = scale_quantile_adapted(receptor_fraction_adapted))
這一步得到了ligand, receptor和target的表達(dá)表
汉矿。以exprs_tbl_ligand為例,每個表中都有l(wèi)igand/ receptor/ target的細(xì)胞類型备禀,表達(dá)量和在細(xì)胞中的表達(dá)百分比洲拇。
6. Expression fraction and receptor
在這一步中奈揍,我們將會基于受體表達(dá)強(qiáng)度計(jì)算配體-受體互作,基于此赋续,我們將對各細(xì)胞類型里每個配體的受體進(jìn)行打分男翰,表達(dá)最強(qiáng)的受體將被給予最高的評分。這不會影響隨后對單個配體的排序纽乱,但是將會幫助我們對每個配體最重要的受體進(jìn)行排序蛾绎。(next to other factors regarding the receptor - see later).
exprs_sender_receiver = lr_network %>%
inner_join(exprs_tbl_ligand, by = c("ligand")) %>%
inner_join(exprs_tbl_receptor, by = c("receptor")) %>% inner_join(DE_sender_receiver %>% distinct(niche, sender, receiver))
ligand_scaled_receptor_expression_fraction_df = exprs_sender_receiver %>% group_by(ligand, receiver) %>% mutate(rank_receptor_expression = dense_rank(receptor_expression), rank_receptor_fraction = dense_rank(receptor_fraction)) %>% mutate(ligand_scaled_receptor_expression_fraction = 0.5*( (rank_receptor_fraction / max(rank_receptor_fraction)) + ((rank_receptor_expression / max(rank_receptor_expression))) ) ) %>% distinct(ligand, receptor, receiver, ligand_scaled_receptor_expression_fraction, bonafide) %>% distinct() %>% ungroup()
View(ligand_scaled_receptor_expression_fraction_df)
7. Prioritization of ligand-receptor and ligand-target links
在這一步中,我們將會結(jié)合上面所有的計(jì)算結(jié)果來對ligand-receptor-target之間的links進(jìn)行排序鸦列。
We scale every property of interest between 0 and 1, and the final prioritization score is a weighted sum of the scaled scores of all the properties of interest.
prioritizing_weights
由以下代碼中的12個參數(shù)綜合決定租冠。
prioritizing_weights = c("scaled_ligand_score" = 5,
"scaled_ligand_expression_scaled" = 1,
"ligand_fraction" = 1,
"scaled_ligand_score_spatial" = 2,
"scaled_receptor_score" = 0.5,
"scaled_receptor_expression_scaled" = 0.5,
"receptor_fraction" = 1,
"ligand_scaled_receptor_expression_fraction" = 1,
"scaled_receptor_score_spatial" = 0,
"scaled_activity" = 0,
"scaled_activity_normalized" = 1,
"bona_fide" = 1)
上面的參數(shù)是可變的,需要根據(jù)不同的需求設(shè)置薯嗤。(每個參數(shù)的含義見:Differential NicheNet analysis between conditions of interest)
output = list(DE_sender_receiver = DE_sender_receiver, ligand_scaled_receptor_expression_fraction_df = ligand_scaled_receptor_expression_fraction_df, sender_spatial_DE_processed = sender_spatial_DE_processed, receiver_spatial_DE_processed = receiver_spatial_DE_processed,
ligand_activities_targets = ligand_activities_targets, DE_receiver_processed_targets = DE_receiver_processed_targets, exprs_tbl_ligand = exprs_tbl_ligand, exprs_tbl_receptor = exprs_tbl_receptor, exprs_tbl_target = exprs_tbl_target)
prioritization_tables = get_prioritization_tables(output, prioritizing_weights)
##查看list基本結(jié)構(gòu)
prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(receiver == niches[[1]]$receiver) %>% head(10)
prioritization_tables$prioritization_tbl_ligand_target %>% filter(receiver == niches[[1]]$receiver) %>% head(10)
prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(receiver == niches[[2]]$receiver) %>% head(10)
prioritization_tables$prioritization_tbl_ligand_target %>% filter(receiver == niches[[2]]$receiver) %>% head(10)
8. 可視化
8.1 Differential expression of ligand and expression
在可視化之前顽爹,我們需要先定義每個niche中最重要的配體受體對。We will do this by first determining for which niche the highest score is found for each ligand/ligand-receptor pair. 然后我們就可以得到每個niche中的top 50 ligands骆姐。
top_ligand_niche_df = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, receptor, prioritization_score) %>% group_by(ligand) %>% top_n(1, prioritization_score) %>% ungroup() %>% select(ligand, receptor, niche) %>% rename(top_niche = niche)
top_ligand_receptor_niche_df = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, receptor, prioritization_score) %>% group_by(ligand, receptor) %>% top_n(1, prioritization_score) %>% ungroup() %>% select(ligand, receptor, niche) %>% rename(top_niche = niche)
ligand_prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, prioritization_score) %>% group_by(ligand, niche) %>% top_n(1, prioritization_score) %>% ungroup() %>% distinct() %>% inner_join(top_ligand_niche_df) %>% filter(niche == top_niche) %>% group_by(niche) %>% top_n(50, prioritization_score) %>% ungroup() # get the top50 ligands per niche
table(ligand_prioritized_tbl_oi$receiver)
# Malignant_High Malignant_Low
# 50 50
View(ligand_prioritized_tbl_oi)
然后就是可視化啦镜粤!首先我們展示top 受體-配體對,這里展示了每個配體的評分最高的top 2受體玻褪。(可以通過修改下面top_n后的數(shù)字修改)
receiver_oi = "Malignant_High"
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
Visualization: minimum LFC compared to other niches
lfc_plot = make_ligand_receptor_lfc_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot
Show the spatialDE as additional information(適用于空間數(shù)據(jù))
lfc_plot_spatial = make_ligand_receptor_lfc_spatial_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, ligand_spatial = include_spatial_info_sender, receptor_spatial = include_spatial_info_receiver, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot_spatial
8.2 配體表達(dá)庸诱,活性和靶基因
visualization of ligand activity and ligand-target links
exprs_activity_target_plot = make_ligand_activity_target_exprs_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, prioritization_tables$prioritization_tbl_ligand_target, output$exprs_tbl_ligand, output$exprs_tbl_target, lfc_cutoff, ligand_target_matrix, plot_legend = FALSE, heights = NULL, widths = NULL)
exprs_activity_target_plot$combined_plot
基于這個plot,我們可以推斷出許多假說假說昧识,比如:“Interestingly, IL1 family ligands seem to have activity in inducing the DE genes between high pEMT and low pEMT malignant cells; and they are mainly expressed by myeloid cells, a cell type unique for pEMT-high tumors.”
important: ligand-receptor pairs with both high differential expression (or condition-specificity) and ligand activity (=target gene enrichment) are very interesting predictions as key regulators of your intercellular communication process of interest !
這個圖默認(rèn)展示top50的配體钠四。如果覺得展示的信息太多,不容易從其中找到有價值的線索跪楞,也可以只展示top20缀去。
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(20, prioritization_score) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
exprs_activity_target_plot = make_ligand_activity_target_exprs_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, prioritization_tables$prioritization_tbl_ligand_target, output$exprs_tbl_ligand, output$exprs_tbl_target, lfc_cutoff, ligand_target_matrix, plot_legend = FALSE, heights = NULL, widths = NULL)
exprs_activity_target_plot$combined_plot
8.3 Circos plot of prioritized ligand-receptor pairs
Because a top50 is too much to visualize in a circos plot, we will only visualize the top 15.
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(15, prioritization_score) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
colors_sender = brewer.pal(n = prioritized_tbl_oi$sender %>% unique() %>% sort() %>% length(), name = 'Spectral') %>% magrittr::set_names(prioritized_tbl_oi$sender %>% unique() %>% sort())
colors_receiver = c("lavender") %>% magrittr::set_names(prioritized_tbl_oi$receiver %>% unique() %>% sort())
circos_output = make_circos_lr(prioritized_tbl_oi, colors_sender, colors_receiver)
8.4 Interpretation of these results
多數(shù)排名靠前的差異性L-R對似乎來自僅存在于pEMT high腫瘤中的細(xì)胞類型。這可能部分是由于生物學(xué)的因素(在某種情況下某種獨(dú)特的細(xì)胞類型可能非常重要)甸祭,但也可能是由于優(yōu)先排序的方式以及這些獨(dú)特的細(xì)胞類型在其他niche中并沒有 “counterpart”缕碎。
由于腫瘤微環(huán)境中,髓系細(xì)胞和T細(xì)胞與其他細(xì)胞有很大不同池户,因此它們的配體會顯示出很強(qiáng)的差異表達(dá)咏雌。與來自相同細(xì)胞類型但不同niche/條件的細(xì)胞之間的差異表達(dá)(pEMT-high中的CAF與pEMT-low中的CAF相比)相比凡怎,同一niche/條件中(如pEMT-low腫瘤中)髓系/T 細(xì)胞vs肌成纖維細(xì)胞/CAFs/內(nèi)皮細(xì)胞的差異表達(dá)可能更明顯(這時候需要做同一niche不同細(xì)胞類型的互作差異)。
8.5 Visualization for the other condition: pEMT-low
前面可視化的是在pEMT-high niche中上調(diào)的受體配體對赊抖,我們也可以可視化在pEMT-low niche中上調(diào)的受體配體對统倒。
receiver_oi = "Malignant_Low"
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(50, prioritization_score) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
lfc_plot = make_ligand_receptor_lfc_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot
9. Notes, limitations, and comparison to default NicheNet.
在原始的NicheNet pipeline中,配體 - 受體對表達(dá)的排序僅僅是根據(jù)其配體活性得出的氛雪。在這個差異性NicheNet pipeline中房匆,我們也是根據(jù)與其他niches(或者其他空間位置信息[空轉(zhuǎn)數(shù)據(jù)])相比,L-R 對的差異表達(dá)來得到信息报亩。
因?yàn)槲覀冊谶@里關(guān)注配體- 受體對的差異表達(dá)浴鸿,并且通過默認(rèn)將DE而不是activity賦予更高的優(yōu)先級權(quán)重,我們傾向于找到許多與默認(rèn)NicheNet pipeline不同的hits弦追。在Differential NicheNet中赚楚,我們傾向于找到更多的high-DE, low-activity hits,而使用default NicheNet骗卜,我們找到更多的low-DE, high-activity hits宠页。
應(yīng)該注意的是,一些high-DE, low-activity hits可能非常重要寇仓,因?yàn)樗鼈兛赡苁怯捎贜icheNet活性預(yù)測的限制而具有低NicheNet活性(例如NicheNet中關(guān)于該配體的不正確/不完全的先驗(yàn)知識)举户,但其中一些也可能在DE中很高,但activity不高遍烦,因?yàn)樗鼈儧]有強(qiáng)烈的信號效應(yīng)(例如俭嘁,僅參與細(xì)胞粘附的配體)。
相反的服猪,對于在Diffifer NicheNet中沒有被強(qiáng)烈優(yōu)先考慮的low-DE, high-activity的受體配體對供填,應(yīng)考慮以下因素:1)一些配體受到轉(zhuǎn)錄后調(diào)控,高預(yù)測活性可能仍然反映的是真實(shí)的信號傳導(dǎo); 2)高預(yù)測活性值可能是由于NicheNet的局限性(不準(zhǔn)確的先驗(yàn)知識)罢猪,這些低DE配體在感興趣的生物學(xué)過程中并不重要(但該配體的高DE家族成員可能是重要的近她。因?yàn)榧易宄蓡T之間的信號傳導(dǎo)往往非常相似); 3)一種情況下的高活性可能是由于另一種情況下的下調(diào),導(dǎo)致高activity和DE低膳帕。目前粘捎,配體活性是在每個條件下根據(jù)上調(diào)基因計(jì)算的,但下調(diào)基因也可能是配體活性的標(biāo)志危彩。
當(dāng)配體 - 受體對同時具有高DE和高activity時攒磨,我們可以認(rèn)為它們是調(diào)節(jié)感興趣過程的非常好的候選者,我們建議測試這些候選物以進(jìn)行進(jìn)一步的實(shí)驗(yàn)驗(yàn)證汤徽。
應(yīng)用文獻(xiàn)
References
Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods (2019) doi:10.1038/s41592-019-0667-5
Guilliams et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell (2022) doi:10.1016/j.cell.2021.12.018
Puram, Sidharth V., Itay Tirosh, Anuraag S. Parikh, Anoop P. Patel, Keren Yizhak, Shawn Gillespie, Christopher Rodman, et al. 2017. “Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer.” Cell 171 (7): 1611–1624.e24. https://doi.org/10.1016/j.cell.2017.10.044.