常用的細(xì)胞通訊軟件:
- CellphoneDB:是公開(kāi)的人工校正的清女,儲(chǔ)存受體蚕甥、配體以及兩種相互作用的數(shù)據(jù)庫(kù)躏惋。此外发框,還考慮了結(jié)構(gòu)組成泣懊,能夠描述異構(gòu)復(fù)合物伸辟。(配體-受體+多聚體)
- iTALK:通過(guò)平均表達(dá)量方式,篩選高表達(dá)的胚體和受體馍刮,根據(jù)結(jié)果作圈圖信夫。(配體-受體)
- CellChat:CellChat將細(xì)胞的基因表達(dá)數(shù)據(jù)作為輸入,并結(jié)合配體受體及其輔助因子的相互作用來(lái)模擬細(xì)胞間通訊卡啰。(配體-受體+多聚體+輔因子)
- NicheNet // NicheNet多樣本分析 // 目標(biāo)基因的配體和靶基因活性預(yù)測(cè):通過(guò)將相互作用細(xì)胞的表達(dá)數(shù)據(jù)與信號(hào)和基因調(diào)控網(wǎng)絡(luò)的先驗(yàn)知識(shí)相結(jié)合來(lái)預(yù)測(cè)相互作用細(xì)胞之間的配體-靶標(biāo)聯(lián)系的方法静稻。( 配體-受體+信號(hào)通路)
附:NicheNet使用的常見(jiàn)問(wèn)題匯總其它細(xì)胞互作軟件還包括
Celltalker
,SingleCellSignalR
匈辱,scTensor
和SoptSC
(這幾個(gè)也是基于配體-受體相互作用)
Nichenet可以預(yù)測(cè):
- which ligands from one cell population (“sender/niche”) are most likely to affect target gene expression in an interacting cell population (“receiver/target”);
- which specific target genes are affected by which of these predicted ligands.
1. 演示
1.0 加載R包和數(shù)據(jù)集
# R包
library(nichenetr)
library(tidyverse)
# 配體靶基因信息
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
## CXCL1 CXCL2 CXCL3 CXCL5 PPBP
## A1BG 3.534343e-04 4.041324e-04 3.729920e-04 3.080640e-04 2.628388e-04
## A1BG-AS1 1.650894e-04 1.509213e-04 1.583594e-04 1.317253e-04 1.231819e-04
## A1CF 5.787175e-04 4.596295e-04 3.895907e-04 3.293275e-04 3.211944e-04
## A2M 6.027058e-04 5.996617e-04 5.164365e-04 4.517236e-04 4.590521e-04
## A2M-AS1 8.898724e-05 8.243341e-05 7.484018e-05 4.912514e-05 5.120439e-05
##表達(dá)矩陣和metadata
hnscc_expression = readRDS(url("https://zenodo.org/record/3260758/files/hnscc_expression.rds"))
expression = hnscc_expression$expression
sample_info = hnscc_expression$sample_info # contains meta-information about the cells
View(hnscc_expression)
1.1 Define expressed genes in sender and receiver cell populations
我們的目標(biāo)是探究CAFs表達(dá)的什么配體引起了周?chē)[瘤細(xì)胞的p-EMT振湾,所以CAFs是sender細(xì)胞,腫瘤細(xì)胞是receiver細(xì)胞亡脸。(sender 和 receiver 也可以是同一種細(xì)胞類(lèi)型押搪,也就是自分泌)
因?yàn)槲覀兿胍芯康氖莌igh quality primary tumors, 因此less quality的和lymph node metastases的腫瘤樣本將被剔除浅碾。
在這個(gè)數(shù)據(jù)集中大州,為了定義expressed genes,我們采用了Ea, the aggregate expression of each gene i across the k cells, calculated as Ea(i) = log2(average(TPM(i)1…k)+1), should be >= 4. 而10x的數(shù)據(jù)集及穗,我們更推薦genes to be expressed in a cell type when they have non-zero values in at least 10% of the cells from that cell type.
tumors_remove = c("HN10","HN","HN12", "HN13", "HN24", "HN7", "HN8","HN23")
CAF_ids = sample_info %>% filter(`Lymph node` == 0 & !(tumor %in% tumors_remove) & `non-cancer cell type` == "CAF") %>% pull(cell)
malignant_ids = sample_info %>% filter(`Lymph node` == 0 & !(tumor %in% tumors_remove) & `classified as cancer cell` == 1) %>% pull(cell)
expressed_genes_sender = expression[CAF_ids,] %>% apply(2,function(x){10*(2**x - 1)}) %>% apply(2,function(x){log2(mean(x) + 1)}) %>% .[. >= 4] %>% names()
expressed_genes_receiver = expression[malignant_ids,] %>% apply(2,function(x){10*(2**x - 1)}) %>% apply(2,function(x){log2(mean(x) + 1)}) %>% .[. >= 4] %>% names()
# Check the number of expressed genes: should be a 'reasonable' number of total expressed genes in a cell type, e.g. between 5000-10000 (and not 500 or 20000)
length(expressed_genes_sender)
## [1] 6706
length(expressed_genes_receiver)
## [1] 6351
1.2 Define the gene set of interest and a background of genes
我們使用定義好的p-EMT基因集作為interest gene set摧茴,用腫瘤細(xì)胞表達(dá)的基因作為background.
geneset_oi = readr::read_tsv(url("https://zenodo.org/record/3260758/files/pemt_signature.txt"), col_names = "gene") %>% pull(gene) %>% .[. %in% rownames(ligand_target_matrix)] # only consider genes also present in the NicheNet model - this excludes genes from the gene list for which the official HGNC symbol was not used by Puram et al.
head(geneset_oi)
## [1] "SERPINE1" "TGFBI" "MMP10" "LAMC2" "P4HA2" "PDPN"
background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
head(background_expressed_genes)
## [1] "RPS11" "ELMO2" "PNMA1" "MMP2" "TMEM216" "ERCC5"
1.3 Define a set of potential ligands
作為潛在的活性配體,我們將使用 1) 由 CAF 表達(dá)和 2) 可以結(jié)合惡性細(xì)胞表達(dá)的(putative)受體的配體埂陆。 假定的配體-受體links是從 NicheNet 的ligand-receptor data sources收集的苛白。
lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))
# If wanted, users can remove ligand-receptor interactions that were predicted based on protein-protein interactions and only keep ligand-receptor interactions that are described in curated databases. To do this: uncomment following line of code:
# lr_network = lr_network %>% filter(database != "ppi_prediction_go" & database != "ppi_prediction")
ligands = lr_network %>% pull(from) %>% unique()
expressed_ligands = intersect(ligands,expressed_genes_sender)
receptors = lr_network %>% pull(to) %>% unique()
expressed_receptors = intersect(receptors,expressed_genes_receiver)
lr_network_expressed = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors)
head(lr_network_expressed)
## # A tibble: 6 x 4
## from to source database
## <chr> <chr> <chr> <chr>
## 1 HGF MET kegg_cytokines kegg
## 2 TNFSF10 TNFRSF10A kegg_cytokines kegg
## 3 TNFSF10 TNFRSF10B kegg_cytokines kegg
## 4 TGFB2 TGFBR1 kegg_cytokines kegg
## 5 TGFB3 TGFBR1 kegg_cytokines kegg
## 6 INHBA ACVR2A kegg_cytokines kegg
配體-受體網(wǎng)絡(luò)包含表達(dá)的配體-受體相互作用娃豹。 作為 NicheNet 分析的潛在活性配體,我們將考慮來(lái)自該網(wǎng)絡(luò)的配體购裙。
potential_ligands = lr_network_expressed %>% pull(from) %>% unique()
head(potential_ligands)
## [1] "HGF" "TNFSF10" "TGFB2" "TGFB3" "INHBA" "CD99"
1.4 Perform NicheNet’s ligand activity analysis on the gene set of interest
現(xiàn)在進(jìn)行配體活性分析:在此分析中懂版,我們將計(jì)算每個(gè)配體的配體活性,或者換句話說(shuō)躏率,我們將評(píng)估每個(gè) CAF 配體 (和背景基因相比) 預(yù)測(cè) p-EMT 基因的能力 (預(yù)測(cè)一個(gè)基因是否屬于 p-EMT program)躯畴。
ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)
現(xiàn)在,我們要根據(jù)配體活性對(duì)配體進(jìn)行排名薇芝。 在我們的validation study中蓬抄,我們發(fā)現(xiàn)配體的靶基因預(yù)測(cè)與觀察到的轉(zhuǎn)錄反應(yīng)之間的pearson 相關(guān)系數(shù) (PCC) 是定義配體活性的最有用的測(cè)量方法。 因此夯到,我們將根據(jù)配體的PCC對(duì)配體進(jìn)行排名嚷缭。 This allows us to prioritize p-EMT-regulating ligands.
ligand_activities %>% arrange(-pearson)
## # A tibble: 131 x 4
## test_ligand auroc aupr pearson
## <chr> <dbl> <dbl> <dbl>
## 1 PTHLH 0.667 0.0720 0.128
## 2 CXCL12 0.680 0.0507 0.123
## 3 AGT 0.676 0.0581 0.120
## 4 TGFB3 0.689 0.0454 0.117
## 5 IL6 0.693 0.0510 0.115
## 6 INHBA 0.695 0.0502 0.113
## 7 ADAM17 0.672 0.0526 0.113
## 8 TNC 0.700 0.0444 0.109
## 9 CTGF 0.680 0.0473 0.108
## 10 FN1 0.679 0.0505 0.108
## # ... with 121 more rows
best_upstream_ligands = ligand_activities %>% top_n(20, pearson) %>% arrange(-pearson) %>% pull(test_ligand)
head(best_upstream_ligands)
## [1] "PTHLH" "CXCL12" "AGT" "TGFB3" "IL6" "INHBA"
我們?cè)谶@里看到,performance metrics 表明 20 個(gè)排名靠前的配體可以合理地預(yù)測(cè) p-EMT 基因耍贾,這意味著配體的排名應(yīng)該是準(zhǔn)確的阅爽。 然而,對(duì)于某些基因集荐开,排名靠前的配體的目標(biāo)基因預(yù)測(cè)性能可能不會(huì)比隨機(jī)預(yù)測(cè)好多少付翁。 在這種情況下,配體的優(yōu)先級(jí)將不太可信晃听。
Additional note:我們?cè)谶@里查看了前 20 個(gè)配體百侧,并將通過(guò)推斷這 20 個(gè)配體的 p-EMT 靶基因來(lái)繼續(xù)分析。 然而能扒,選擇僅查看排名靠前的 20 個(gè)配體以進(jìn)行進(jìn)一步的生物學(xué)解釋是基于生物學(xué)直覺(jué)并且是相當(dāng)隨意的移层。 因此,用戶(hù)可以決定使用不同數(shù)量的配體繼續(xù)分析赫粥。 我們建議通過(guò)查看配體活性值的分布來(lái)檢查選定的截止值。 在這里予借,我們顯示了配體活性直方圖(第 20 個(gè)配體的分?jǐn)?shù)通過(guò)虛線表示)越平。
# show histogram of ligand activity scores
p_hist_lig_activity = ggplot(ligand_activities, aes(x=pearson)) +
geom_histogram(color="black", fill="darkorange") +
# geom_density(alpha=.1, fill="orange") +
geom_vline(aes(xintercept=min(ligand_activities %>% top_n(20, pearson) %>% pull(pearson))), color="red", linetype="dashed", size=1) +
labs(x="ligand activity (PCC)", y = "# ligands") +
theme_classic()
p_hist_lig_activity
1.5 Infer target genes of top-ranked ligands and visualize in a heatmap
現(xiàn)在我們將展示如何查看配體和感興趣的靶基因之間的調(diào)節(jié)潛力評(píng)分。 在這種情況下灵迫,我們將研究排名靠前的 p-EMT 調(diào)節(jié)配體和 p-EMT 基因之間的聯(lián)系秦叛。 在配體-靶標(biāo)熱圖中,我們展示了 20 個(gè)排名靠前的配體與以下靶基因之間相互作用的調(diào)節(jié)潛力評(píng)分:屬于感興趣基因組的基因和 20 個(gè)排名靠前的配體中至少一個(gè)的 250 個(gè)最強(qiáng)烈預(yù)測(cè)的靶標(biāo)(根據(jù)一般先驗(yàn)?zāi)P偷那?250 個(gè)靶標(biāo)瀑粥,因此不是該數(shù)據(jù)集的前 250 個(gè)靶標(biāo))挣跋。 因此,基因集中不是優(yōu)先配體之一的top靶基因的基因?qū)⒉粫?huì)顯示在熱圖上狞换。
active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 250) %>% bind_rows()
nrow(active_ligand_target_links_df)
## [1] 143
head(active_ligand_target_links_df)
## # A tibble: 6 x 3
## ligand target weight
## <chr> <chr> <dbl>
## 1 PTHLH COL1A1 0.00399
## 2 PTHLH MMP1 0.00425
## 3 PTHLH MMP2 0.00210
## 4 PTHLH MYH9 0.00116
## 5 PTHLH P4HA2 0.00190
## 6 PTHLH PLAU 0.00401
出于可視化目的避咆,我們按照如下方法調(diào)整了配體-靶標(biāo)regulatory potential matrix舟肉。 如果它們的分?jǐn)?shù)低于預(yù)定義的閾值,則將調(diào)節(jié)潛力評(píng)分設(shè)置為 0查库,在這里使用的是 20 個(gè)排名靠前的配體與其各自的頂級(jí)目標(biāo)之間的相互作用分?jǐn)?shù)的 0.25 分位數(shù)(see the ligand-target network defined in the data frame)路媚。
active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.25)
nrow(active_ligand_target_links_df)
## [1] 143
head(active_ligand_target_links_df)
## # A tibble: 6 x 3
## ligand target weight
## <chr> <chr> <dbl>
## 1 PTHLH COL1A1 0.00399
## 2 PTHLH MMP1 0.00425
## 3 PTHLH MMP2 0.00210
## 4 PTHLH MYH9 0.00116
## 5 PTHLH P4HA2 0.00190
## 6 PTHLH PLAU 0.00401
我們使用熱圖來(lái)對(duì)putatively活性配體-目標(biāo)links進(jìn)行可視化。 配體的順序與根據(jù)配體活性預(yù)測(cè)的排序一致樊销。
order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev()
order_targets = active_ligand_target_links_df$target %>% unique()
vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()
p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized CAF-ligands","p-EMT genes in malignant cells", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential") + scale_fill_gradient2(low = "whitesmoke", high = "purple", breaks = c(0,0.005,0.01)) + theme(axis.text.x = element_text(face = "italic"))
p_ligand_target_network
請(qǐng)注意整慎,這些可視化cutoffs的選擇是比較主觀的。 我們建議多測(cè)試幾個(gè)cutoffs围苫。
如果根據(jù)先驗(yàn)信息考慮超過(guò)前 250 個(gè)targets裤园,將推斷出更多但不太confident的ligand-target links; 通過(guò)考慮少于 250 個(gè)targets剂府,結(jié)果會(huì)更加stringent拧揽。
如果您將用于將分?jǐn)?shù)設(shè)置為 0(出于可視化目的)的分位數(shù)截止值更改,則降低此截止值將導(dǎo)致更密集的熱圖周循,而提高此截止值將導(dǎo)致更稀疏的熱圖强法。
2. Follow-up analysis
2.1 Ligand-receptor network inference for top-ranked ligands
一種類(lèi)型的后續(xù)分析是觀察receiver細(xì)胞群(此處:腫瘤細(xì)胞)的哪些受體可能與來(lái)自sender細(xì)胞群(此處:CAF)的優(yōu)先配體結(jié)合。
因此湾笛,我們現(xiàn)在將推斷出排名靠前的配體的預(yù)測(cè)配體-受體相互作用饮怯,并在熱圖中將它們可視化。
# get the ligand-receptor network of the top-ranked ligands
lr_network_top = lr_network %>% filter(from %in% best_upstream_ligands & to %in% expressed_receptors) %>% distinct(from,to)
best_upstream_receptors = lr_network_top %>% pull(to) %>% unique()
# get the weights of the ligand-receptor interactions as used in the NicheNet model
weighted_networks = readRDS(url("https://zenodo.org/record/3260758/files/weighted_networks.rds"))
lr_network_top_df = weighted_networks$lr_sig %>% filter(from %in% best_upstream_ligands & to %in% best_upstream_receptors)
# convert to a matrix
lr_network_top_df = lr_network_top_df %>% spread("from","weight",fill = 0)
lr_network_top_matrix = lr_network_top_df %>% select(-to) %>% as.matrix() %>% magrittr::set_rownames(lr_network_top_df$to)
# perform hierarchical clustering to order the ligands and receptors
dist_receptors = dist(lr_network_top_matrix, method = "binary")
hclust_receptors = hclust(dist_receptors, method = "ward.D2")
order_receptors = hclust_receptors$labels[hclust_receptors$order]
dist_ligands = dist(lr_network_top_matrix %>% t(), method = "binary")
hclust_ligands = hclust(dist_ligands, method = "ward.D2")
order_ligands_receptor = hclust_ligands$labels[hclust_ligands$order]
Show a heatmap of the ligand-receptor interactions
vis_ligand_receptor_network = lr_network_top_matrix[order_receptors, order_ligands_receptor]
p_ligand_receptor_network = vis_ligand_receptor_network %>% t() %>% make_heatmap_ggplot("Prioritized CAF-ligands","Receptors expressed by malignant cells", color = "mediumvioletred", x_axis_position = "top",legend_title = "Prior interaction potential")
p_ligand_receptor_network
2.2 Visualize expression of top-predicted ligands and their target genes in a combined heatmap
NicheNet 只考慮sender細(xì)胞的表達(dá)配體嚎研,但不考慮它們的表達(dá)來(lái)對(duì)配體進(jìn)行排序蓖墅。 該排名純粹基于在給定先驗(yàn)知識(shí)的情況下,配體可能調(diào)節(jié)感興趣的基因集的潛力临扮。 因?yàn)檫M(jìn)一步研究配體及其靶基因的表達(dá)也很有用论矾,我們?cè)诖搜菔救绾沃谱黠@示配體活性、配體表達(dá)杆勇、靶基因表達(dá)和配體-靶調(diào)節(jié)潛力的組合圖贪壳。
Load additional packages required for the visualization:
library(RColorBrewer)
library(cowplot)
library(ggpubr)
Prepare the ligand activity matrix
ligand_pearson_matrix = ligand_activities %>% select(pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities$test_ligand)
vis_ligand_pearson = ligand_pearson_matrix[order_ligands, ] %>% as.matrix(ncol = 1) %>% magrittr::set_colnames("Pearson")
p_ligand_pearson = vis_ligand_pearson %>% make_heatmap_ggplot("Prioritized CAF-ligands","Ligand activity", color = "darkorange",legend_position = "top", x_axis_position = "top", legend_title = "Pearson correlation coefficient\ntarget gene prediction ability)")
p_ligand_pearson
Prepare expression of ligands in fibroblast per tumor
因?yàn)閱渭?xì)胞數(shù)據(jù)是從多個(gè)腫瘤中收集的,我們將在這里顯示每個(gè)腫瘤配體的平均表達(dá)蚜退。
expression_df_CAF = expression[CAF_ids,order_ligands] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by = "cell")
aggregated_expression_CAF = expression_df_CAF %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_CAF = aggregated_expression_CAF %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_CAF$tumor) %>% data.frame() %>% rownames_to_column("ligand") %>% as_tibble()
aggregated_expression_matrix_CAF = aggregated_expression_df_CAF %>% select(-ligand) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_CAF$ligand)
order_tumors = c("HN6","HN20","HN26","HN28","HN22","HN25","HN5","HN18","HN17","HN16") # this order was determined based on the paper from Puram et al. Tumors are ordered according to p-EMT score.
vis_ligand_tumor_expression = aggregated_expression_matrix_CAF[order_ligands,order_tumors]
library(RColorBrewer)
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(100)
p_ligand_tumor_expression = vis_ligand_tumor_expression %>% make_heatmap_ggplot("Prioritized CAF-ligands","Tumor", color = color[100],legend_position = "top", x_axis_position = "top", legend_title = "Expression\n(averaged over\nsingle cells)") + theme(axis.text.y = element_text(face = "italic"))
p_ligand_tumor_expression
Prepare expression of target genes in malignant cells per tumor
expression_df_target = expression[malignant_ids,geneset_oi] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by = "cell")
aggregated_expression_target = expression_df_target %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_target = aggregated_expression_target %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_target$tumor) %>% data.frame() %>% rownames_to_column("target") %>% as_tibble()
aggregated_expression_matrix_target = aggregated_expression_df_target %>% select(-target) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_target$target)
vis_target_tumor_expression_scaled = aggregated_expression_matrix_target %>% t() %>% scale_quantile() %>% .[order_tumors,order_targets]
p_target_tumor_scaled_expression = vis_target_tumor_expression_scaled %>% make_threecolor_heatmap_ggplot("Tumor","Target", low_color = color[1],mid_color = color[50], mid = 0.5, high_color = color[100], legend_position = "top", x_axis_position = "top" , legend_title = "Scaled expression\n(averaged over\nsingle cells)") + theme(axis.text.x = element_text(face = "italic"))
p_target_tumor_scaled_expression
都畫(huà)在一起
figures_without_legend = plot_grid(
p_ligand_pearson + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()),
p_ligand_tumor_expression + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()) + ylab(""),
p_ligand_target_network + theme(legend.position = "none", axis.ticks = element_blank()) + ylab(""),
NULL,
NULL,
p_target_tumor_scaled_expression + theme(legend.position = "none", axis.ticks = element_blank()) + xlab(""),
align = "hv",
nrow = 2,
rel_widths = c(ncol(vis_ligand_pearson)+ 4.5, ncol(vis_ligand_tumor_expression), ncol(vis_ligand_target)) -2,
rel_heights = c(nrow(vis_ligand_pearson), nrow(vis_target_tumor_expression_scaled) + 3))
legends = plot_grid(
as_ggplot(get_legend(p_ligand_pearson)),
as_ggplot(get_legend(p_ligand_tumor_expression)),
as_ggplot(get_legend(p_ligand_target_network)),
as_ggplot(get_legend(p_target_tumor_scaled_expression)),
nrow = 2,
align = "h")
plot_grid(figures_without_legend,
legends,
rel_heights = c(10,2), nrow = 2, align = "hv")
2.3 Other follow-up analyses:
As another follow-up analysis, you can infer possible signaling paths between ligands and targets of interest. You can read how to do this in the following vignette Inferring ligand-to-target signaling paths:
vignette("ligand_target_signaling_path", package="nichenetr")
.Another follow-up analysis is getting a “tangible” measure of how well top-ranked ligands predict the gene set of interest and assess which genes of the gene set can be predicted well. You can read how to do this in the following vignette Assess how well top-ranked ligands can predict a gene set of interest:
vignette("target_prediction_evaluation_geneset", package="nichenetr")
.In case you want to visualize ligand-target links between multiple interacting cells, you can make an appealing circos plot as shown in vignette Circos plot visualization to show active ligand-target links between interacting cells:
vignette("circos", package="nichenetr")
.
3. 從seurat對(duì)象做分析
3.0 數(shù)據(jù)集準(zhǔn)備
library(nichenetr)
library(tidyverse)
# 配體靶基因信息
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
#讀入seurat對(duì)象闰靴,用的是nichenet多組比較的那個(gè)數(shù)據(jù)集
seuratObj <- readRDS(url("https://zenodo.org/record/4675430/files/seurat_obj_hnscc.rds"))
#得到表達(dá)矩陣和metadata
expression=t(as.matrix(seuratObj@assays$SCT@data))
sample_info=seuratObj@meta.data
3.1 定義sender 和 receiver細(xì)胞群中的 expressed genes
#篩選樣本
CAF_ids = sample_info %>% filter(`non.cancer.cell.type` == "CAF") %>% pull(cell)
malignant_ids = sample_info %>% filter(`classified..as.cancer.cell` == 1) %>% pull(cell)
#10x推薦至少pct = 0.10
receiver = "Malignant"
expressed_genes_receiver = get_expressed_genes(receiver, seuratObj, pct = 0.10)
length(expressed_genes_receiver)
# [1] 9994
sender = "CAF"
expressed_genes_sender = get_expressed_genes(sender, seuratObj, pct = 0.10)
length(expressed_genes_sender)
# [1] 8117
# 多種sender細(xì)胞的情況
# sender_celltypes = c("CD4 T","Treg", "Mono", "NK", "B", "DC")
# list_expressed_genes_sender = sender_celltypes %>% unique() %>% lapply(get_expressed_genes, seuratObj, 0.10) # lapply to get the expressed genes of every sender cell type separately here
# expressed_genes_sender = list_expressed_genes_sender %>% unlist() %>% unique()
3.2 定義interest 和 background基因集
geneset_oi = readr::read_tsv(url("https://zenodo.org/record/3260758/files/pemt_signature.txt"), col_names = "gene") %>% pull(gene) %>% .[. %in% rownames(ligand_target_matrix)] # only consider genes also present in the NicheNet model - this excludes genes from the gene list for which the official HGNC symbol was not used by Puram et al.
head(geneset_oi)
## [1] "SERPINE1" "TGFBI" "MMP10" "LAMC2" "P4HA2" "PDPN"
background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
head(background_expressed_genes)
3.3 Define a set of potential ligands
lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))
# If wanted, users can remove ligand-receptor interactions that were predicted based on protein-protein interactions and only keep ligand-receptor interactions that are described in curated databases. To do this: uncomment following line of code:
# lr_network = lr_network %>% filter(database != "ppi_prediction_go" & database != "ppi_prediction")
ligands = lr_network %>% pull(from) %>% unique()
expressed_ligands = intersect(ligands,expressed_genes_sender)
receptors = lr_network %>% pull(to) %>% unique()
expressed_receptors = intersect(receptors,expressed_genes_receiver)
lr_network_expressed = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors)
head(lr_network_expressed)
## A tibble: 6 × 4
# from to source database
# <chr> <chr> <chr> <chr>
# 1 IL6 IL6ST kegg_cytokines kegg
# 2 IL6 IL6R kegg_cytokines kegg
# 3 IL11 IL6ST kegg_cytokines kegg
# 4 CLCF1 IL6ST kegg_cytokines kegg
# 5 HGF MET kegg_cytokines kegg
# 6 IL10 IL10RB kegg_cytokines kegg
potential_ligands = lr_network_expressed %>% pull(from) %>% unique()
head(potential_ligands)
# [1] "IL6" "IL11" "CLCF1" "HGF" "IL10" "TNFSF10"
3.4 Perform NicheNet’s ligand activity analysis on the gene set of interest
ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)
ligand_activities %>% arrange(-pearson)
best_upstream_ligands = ligand_activities %>% top_n(20, pearson) %>% arrange(-pearson) %>% pull(test_ligand)
head(best_upstream_ligands)
p_hist_lig_activity = ggplot(ligand_activities, aes(x=pearson)) +
geom_histogram(color="black", fill="darkorange") +
# geom_density(alpha=.1, fill="orange") +
geom_vline(aes(xintercept=min(ligand_activities %>% top_n(20, pearson) %>% pull(pearson))), color="red", linetype="dashed", size=1) +
labs(x="ligand activity (PCC)", y = "# ligands") +
theme_classic()
p_hist_lig_activity
3.5 Infer target genes of top-ranked ligands and visualize in a heatmap
active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 250) %>% bind_rows()
nrow(active_ligand_target_links_df)
## [1] 136
head(active_ligand_target_links_df)
active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.25)
nrow(active_ligand_target_links_df)
## [1] 136
head(active_ligand_target_links_df)
order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev()
order_targets = active_ligand_target_links_df$target %>% unique()
vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()
p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized CAF-ligands","p-EMT genes in malignant cells", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential") + scale_fill_gradient2(low = "whitesmoke", high = "purple", breaks = c(0,0.005,0.01)) + theme(axis.text.x = element_text(face = "italic"))
p_ligand_target_network
3.6 合并繪圖
## Load additional packages required for the visualization
library(RColorBrewer)
library(cowplot)
library(ggpubr)
## Prepare the ligand activity matrix
ligand_pearson_matrix = ligand_activities %>% select(pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities$test_ligand)
vis_ligand_pearson = ligand_pearson_matrix[order_ligands, ] %>% as.matrix(ncol = 1) %>% magrittr::set_colnames("Pearson")
p_ligand_pearson = vis_ligand_pearson %>% make_heatmap_ggplot("Prioritized CAF-ligands","Ligand activity", color = "darkorange",legend_position = "top", x_axis_position = "top", legend_title = "Pearson correlation coefficient\ntarget gene prediction ability)")
# p_ligand_pearson
## Prepare expression of ligands in fibroblast per tumor
expression_df_CAF = expression[CAF_ids,order_ligands] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by = "cell")
aggregated_expression_CAF = expression_df_CAF %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_CAF = aggregated_expression_CAF %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_CAF$tumor) %>% data.frame() %>% rownames_to_column("ligand") %>% as_tibble()
aggregated_expression_matrix_CAF = aggregated_expression_df_CAF %>% select(-ligand) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_CAF$ligand)
order_tumors = c("HN6","HN20","HN26","HN28","HN22","HN25","HN5","HN18","HN17","HN16") # this order was determined based on the paper from Puram et al. Tumors are ordered according to p-EMT score.
vis_ligand_tumor_expression = aggregated_expression_matrix_CAF[order_ligands,order_tumors]
library(RColorBrewer)
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(100)
p_ligand_tumor_expression = vis_ligand_tumor_expression %>% make_heatmap_ggplot("Prioritized CAF-ligands","Tumor", color = color[100],legend_position = "top", x_axis_position = "top", legend_title = "Expression\n(averaged over\nsingle cells)") + theme(axis.text.y = element_text(face = "italic"))
# p_ligand_tumor_expression
## Prepare expression of target genes in malignant cells per tumor
expression_df_target = expression[malignant_ids,geneset_oi] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by = "cell")
aggregated_expression_target = expression_df_target %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_target = aggregated_expression_target %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_target$tumor) %>% data.frame() %>% rownames_to_column("target") %>% as_tibble()
aggregated_expression_matrix_target = aggregated_expression_df_target %>% select(-target) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_target$target)
vis_target_tumor_expression_scaled = aggregated_expression_matrix_target %>% t() %>% scale_quantile() %>% .[order_tumors,order_targets]
p_target_tumor_scaled_expression = vis_target_tumor_expression_scaled %>% make_threecolor_heatmap_ggplot("Tumor","Target", low_color = color[1],mid_color = color[50], mid = 0.5, high_color = color[100], legend_position = "top", x_axis_position = "top" , legend_title = "Scaled expression\n(averaged over\nsingle cells)") + theme(axis.text.x = element_text(face = "italic"))
# p_target_tumor_scaled_expression
##都畫(huà)在一起
figures_without_legend = plot_grid(
p_ligand_pearson + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()),
p_ligand_tumor_expression + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()) + ylab(""),
p_ligand_target_network + theme(legend.position = "none", axis.ticks = element_blank()) + ylab(""),
NULL,
NULL,
p_target_tumor_scaled_expression + theme(legend.position = "none", axis.ticks = element_blank()) + xlab(""),
align = "hv",
nrow = 2,
rel_widths = c(ncol(vis_ligand_pearson)+ 4.5, ncol(vis_ligand_tumor_expression), ncol(vis_ligand_target)) -2,
rel_heights = c(nrow(vis_ligand_pearson), nrow(vis_target_tumor_expression_scaled) + 3))
legends = plot_grid(
as_ggplot(get_legend(p_ligand_pearson)),
as_ggplot(get_legend(p_ligand_tumor_expression)),
as_ggplot(get_legend(p_ligand_target_network)),
as_ggplot(get_legend(p_target_tumor_scaled_expression)),
nrow = 2,
align = "h")
plot_grid(figures_without_legend,
legends,
rel_heights = c(10,2), nrow = 2, align = "hv")