單細(xì)胞分析之細(xì)胞交互-7：目標(biāo)基因集的配體和靶基因活性預(yù)測(cè)分析

常用的細(xì)胞通訊軟件：

CellphoneDB：是公開(kāi)的人工校正的清女，儲(chǔ)存受體蚕甥、配體以及兩種相互作用的數(shù)據(jù)庫(kù)躏惋。此外发框，還考慮了結(jié)構(gòu)組成泣懊，能夠描述異構(gòu)復(fù)合物伸辟。（配體-受體+多聚體）

iTALK：通過(guò)平均表達(dá)量方式，篩選高表達(dá)的胚體和受體馍刮，根據(jù)結(jié)果作圈圖信夫。（配體-受體）

CellChat：CellChat將細(xì)胞的基因表達(dá)數(shù)據(jù)作為輸入，并結(jié)合配體受體及其輔助因子的相互作用來(lái)模擬細(xì)胞間通訊卡啰。（配體-受體+多聚體+輔因子）

NicheNet // NicheNet多樣本分析 // 目標(biāo)基因的配體和靶基因活性預(yù)測(cè)：通過(guò)將相互作用細(xì)胞的表達(dá)數(shù)據(jù)與信號(hào)和基因調(diào)控網(wǎng)絡(luò)的先驗(yàn)知識(shí)相結(jié)合來(lái)預(yù)測(cè)相互作用細(xì)胞之間的配體-靶標(biāo)聯(lián)系的方法静稻。( 配體-受體+信號(hào)通路）
附：NicheNet使用的常見(jiàn)問(wèn)題匯總

其它細(xì)胞互作軟件還包括Celltalker，SingleCellSignalR匈辱，scTensor和SoptSC（這幾個(gè)也是基于配體-受體相互作用）

Nichenet可以預(yù)測(cè)：

which ligands from one cell population (“sender/niche”) are most likely to affect target gene expression in an interacting cell population (“receiver/target”);
which specific target genes are affected by which of these predicted ligands.

1. 演示

1.0 加載R包和數(shù)據(jù)集

# R包
library(nichenetr)
library(tidyverse)

# 配體靶基因信息
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
##                 CXCL1        CXCL2        CXCL3        CXCL5         PPBP
## A1BG     3.534343e-04 4.041324e-04 3.729920e-04 3.080640e-04 2.628388e-04
## A1BG-AS1 1.650894e-04 1.509213e-04 1.583594e-04 1.317253e-04 1.231819e-04
## A1CF     5.787175e-04 4.596295e-04 3.895907e-04 3.293275e-04 3.211944e-04
## A2M      6.027058e-04 5.996617e-04 5.164365e-04 4.517236e-04 4.590521e-04
## A2M-AS1  8.898724e-05 8.243341e-05 7.484018e-05 4.912514e-05 5.120439e-05

##表達(dá)矩陣和metadata
hnscc_expression = readRDS(url("https://zenodo.org/record/3260758/files/hnscc_expression.rds"))
expression = hnscc_expression$expression
sample_info = hnscc_expression$sample_info # contains meta-information about the cells
View(hnscc_expression)

1.1 Define expressed genes in sender and receiver cell populations

我們的目標(biāo)是探究CAFs表達(dá)的什么配體引起了周?chē)[瘤細(xì)胞的p-EMT振湾，所以CAFs是sender細(xì)胞，腫瘤細(xì)胞是receiver細(xì)胞亡脸。（sender 和 receiver 也可以是同一種細(xì)胞類(lèi)型押搪，也就是自分泌）
因?yàn)槲覀兿胍芯康氖莌igh quality primary tumors，因此less quality的和lymph node metastases的腫瘤樣本將被剔除浅碾。
在這個(gè)數(shù)據(jù)集中大州，為了定義expressed genes，我們采用了Ea, the aggregate expression of each gene i across the k cells, calculated as Ea(i) = log2(average(TPM(i)1…k)+1), should be >= 4. 而10x的數(shù)據(jù)集及穗，我們更推薦genes to be expressed in a cell type when they have non-zero values in at least 10% of the cells from that cell type.

tumors_remove = c("HN10","HN","HN12", "HN13", "HN24", "HN7", "HN8","HN23")

CAF_ids = sample_info %>% filter(`Lymph node` == 0 & !(tumor %in% tumors_remove) & `non-cancer cell type` == "CAF") %>% pull(cell)
malignant_ids = sample_info %>% filter(`Lymph node` == 0 & !(tumor %in% tumors_remove) & `classified  as cancer cell` == 1) %>% pull(cell)

expressed_genes_sender = expression[CAF_ids,] %>% apply(2,function(x){10*(2**x - 1)}) %>% apply(2,function(x){log2(mean(x) + 1)}) %>% .[. >= 4] %>% names()
expressed_genes_receiver = expression[malignant_ids,] %>% apply(2,function(x){10*(2**x - 1)}) %>% apply(2,function(x){log2(mean(x) + 1)}) %>% .[. >= 4] %>% names()

# Check the number of expressed genes: should be a 'reasonable' number of total expressed genes in a cell type, e.g. between 5000-10000 (and not 500 or 20000)
length(expressed_genes_sender)
## [1] 6706
length(expressed_genes_receiver)
## [1] 6351

1.2 Define the gene set of interest and a background of genes

我們使用定義好的p-EMT基因集作為interest gene set摧茴，用腫瘤細(xì)胞表達(dá)的基因作為background.

geneset_oi = readr::read_tsv(url("https://zenodo.org/record/3260758/files/pemt_signature.txt"), col_names = "gene") %>% pull(gene) %>% .[. %in% rownames(ligand_target_matrix)] # only consider genes also present in the NicheNet model - this excludes genes from the gene list for which the official HGNC symbol was not used by Puram et al.
head(geneset_oi)
## [1] "SERPINE1" "TGFBI"    "MMP10"    "LAMC2"    "P4HA2"    "PDPN"

background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
head(background_expressed_genes)
## [1] "RPS11"   "ELMO2"   "PNMA1"   "MMP2"    "TMEM216" "ERCC5"

1.3 Define a set of potential ligands

作為潛在的活性配體，我們將使用 1) 由 CAF 表達(dá)和 2) 可以結(jié)合惡性細(xì)胞表達(dá)的（putative）受體的配體埂陆。假定的配體-受體links是從 NicheNet 的ligand-receptor data sources收集的苛白。

lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))

# If wanted, users can remove ligand-receptor interactions that were predicted based on protein-protein interactions and only keep ligand-receptor interactions that are described in curated databases. To do this: uncomment following line of code:
# lr_network = lr_network %>% filter(database != "ppi_prediction_go" & database != "ppi_prediction")

ligands = lr_network %>% pull(from) %>% unique()
expressed_ligands = intersect(ligands,expressed_genes_sender)

receptors = lr_network %>% pull(to) %>% unique()
expressed_receptors = intersect(receptors,expressed_genes_receiver)

lr_network_expressed = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors) 
head(lr_network_expressed)
## # A tibble: 6 x 4
##   from    to        source         database
##   <chr>   <chr>     <chr>          <chr>   
## 1 HGF     MET       kegg_cytokines kegg    
## 2 TNFSF10 TNFRSF10A kegg_cytokines kegg    
## 3 TNFSF10 TNFRSF10B kegg_cytokines kegg    
## 4 TGFB2   TGFBR1    kegg_cytokines kegg    
## 5 TGFB3   TGFBR1    kegg_cytokines kegg    
## 6 INHBA   ACVR2A    kegg_cytokines kegg

配體-受體網(wǎng)絡(luò)包含表達(dá)的配體-受體相互作用娃豹。作為 NicheNet 分析的潛在活性配體，我們將考慮來(lái)自該網(wǎng)絡(luò)的配體购裙。

potential_ligands = lr_network_expressed %>% pull(from) %>% unique()
head(potential_ligands)
## [1] "HGF"     "TNFSF10" "TGFB2"   "TGFB3"   "INHBA"   "CD99"

1.4 Perform NicheNet’s ligand activity analysis on the gene set of interest

現(xiàn)在進(jìn)行配體活性分析：在此分析中懂版，我們將計(jì)算每個(gè)配體的配體活性，或者換句話說(shuō)躏率，我們將評(píng)估每個(gè) CAF 配體 (和背景基因相比) 預(yù)測(cè) p-EMT 基因的能力（預(yù)測(cè)一個(gè)基因是否屬于 p-EMT program）躯畴。

ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)

現(xiàn)在，我們要根據(jù)配體活性對(duì)配體進(jìn)行排名薇芝。在我們的validation study中蓬抄，我們發(fā)現(xiàn)配體的靶基因預(yù)測(cè)與觀察到的轉(zhuǎn)錄反應(yīng)之間的pearson 相關(guān)系數(shù) (PCC) 是定義配體活性的最有用的測(cè)量方法。因此夯到，我們將根據(jù)配體的PCC對(duì)配體進(jìn)行排名嚷缭。 This allows us to prioritize p-EMT-regulating ligands.

ligand_activities %>% arrange(-pearson) 
## # A tibble: 131 x 4
##    test_ligand auroc   aupr pearson
##    <chr>       <dbl>  <dbl>   <dbl>
##  1 PTHLH       0.667 0.0720   0.128
##  2 CXCL12      0.680 0.0507   0.123
##  3 AGT         0.676 0.0581   0.120
##  4 TGFB3       0.689 0.0454   0.117
##  5 IL6         0.693 0.0510   0.115
##  6 INHBA       0.695 0.0502   0.113
##  7 ADAM17      0.672 0.0526   0.113
##  8 TNC         0.700 0.0444   0.109
##  9 CTGF        0.680 0.0473   0.108
## 10 FN1         0.679 0.0505   0.108
## # ... with 121 more rows
best_upstream_ligands = ligand_activities %>% top_n(20, pearson) %>% arrange(-pearson) %>% pull(test_ligand)
head(best_upstream_ligands)
## [1] "PTHLH"  "CXCL12" "AGT"    "TGFB3"  "IL6"    "INHBA"

我們?cè)谶@里看到，performance metrics 表明 20 個(gè)排名靠前的配體可以合理地預(yù)測(cè) p-EMT 基因耍贾，這意味著配體的排名應(yīng)該是準(zhǔn)確的阅爽。然而，對(duì)于某些基因集荐开，排名靠前的配體的目標(biāo)基因預(yù)測(cè)性能可能不會(huì)比隨機(jī)預(yù)測(cè)好多少付翁。在這種情況下，配體的優(yōu)先級(jí)將不太可信晃听。

Additional note：我們?cè)谶@里查看了前 20 個(gè)配體百侧，并將通過(guò)推斷這 20 個(gè)配體的 p-EMT 靶基因來(lái)繼續(xù)分析。然而能扒，選擇僅查看排名靠前的 20 個(gè)配體以進(jìn)行進(jìn)一步的生物學(xué)解釋是基于生物學(xué)直覺(jué)并且是相當(dāng)隨意的移层。因此，用戶(hù)可以決定使用不同數(shù)量的配體繼續(xù)分析赫粥。我們建議通過(guò)查看配體活性值的分布來(lái)檢查選定的截止值。在這里予借，我們顯示了配體活性直方圖（第 20 個(gè)配體的分?jǐn)?shù)通過(guò)虛線表示）越平。

# show histogram of ligand activity scores
p_hist_lig_activity = ggplot(ligand_activities, aes(x=pearson)) + 
  geom_histogram(color="black", fill="darkorange")  + 
  # geom_density(alpha=.1, fill="orange") +
  geom_vline(aes(xintercept=min(ligand_activities %>% top_n(20, pearson) %>% pull(pearson))), color="red", linetype="dashed", size=1) + 
  labs(x="ligand activity (PCC)", y = "# ligands") +
  theme_classic()
p_hist_lig_activity

1.5 Infer target genes of top-ranked ligands and visualize in a heatmap

現(xiàn)在我們將展示如何查看配體和感興趣的靶基因之間的調(diào)節(jié)潛力評(píng)分。在這種情況下灵迫，我們將研究排名靠前的 p-EMT 調(diào)節(jié)配體和 p-EMT 基因之間的聯(lián)系秦叛。在配體-靶標(biāo)熱圖中，我們展示了 20 個(gè)排名靠前的配體與以下靶基因之間相互作用的調(diào)節(jié)潛力評(píng)分：屬于感興趣基因組的基因和 20 個(gè)排名靠前的配體中至少一個(gè)的 250 個(gè)最強(qiáng)烈預(yù)測(cè)的靶標(biāo)（根據(jù)一般先驗(yàn)?zāi)Ｐ偷那?250 個(gè)靶標(biāo)瀑粥，因此不是該數(shù)據(jù)集的前 250 個(gè)靶標(biāo)）挣跋。因此，基因集中不是優(yōu)先配體之一的top靶基因的基因?qū)⒉粫?huì)顯示在熱圖上狞换。

active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 250) %>% bind_rows()

nrow(active_ligand_target_links_df)
## [1] 143
head(active_ligand_target_links_df)
## # A tibble: 6 x 3
##   ligand target  weight
##   <chr>  <chr>    <dbl>
## 1 PTHLH  COL1A1 0.00399
## 2 PTHLH  MMP1   0.00425
## 3 PTHLH  MMP2   0.00210
## 4 PTHLH  MYH9   0.00116
## 5 PTHLH  P4HA2  0.00190
## 6 PTHLH  PLAU   0.00401

出于可視化目的避咆，我們按照如下方法調(diào)整了配體-靶標(biāo)regulatory potential matrix舟肉。如果它們的分?jǐn)?shù)低于預(yù)定義的閾值，則將調(diào)節(jié)潛力評(píng)分設(shè)置為 0查库，在這里使用的是 20 個(gè)排名靠前的配體與其各自的頂級(jí)目標(biāo)之間的相互作用分?jǐn)?shù)的 0.25 分位數(shù)（see the ligand-target network defined in the data frame）路媚。

active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.25)

nrow(active_ligand_target_links_df)
## [1] 143
head(active_ligand_target_links_df)
## # A tibble: 6 x 3
##   ligand target  weight
##   <chr>  <chr>    <dbl>
## 1 PTHLH  COL1A1 0.00399
## 2 PTHLH  MMP1   0.00425
## 3 PTHLH  MMP2   0.00210
## 4 PTHLH  MYH9   0.00116
## 5 PTHLH  P4HA2  0.00190
## 6 PTHLH  PLAU   0.00401

我們使用熱圖來(lái)對(duì)putatively活性配體-目標(biāo)links進(jìn)行可視化。配體的順序與根據(jù)配體活性預(yù)測(cè)的排序一致樊销。

order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev()
order_targets = active_ligand_target_links_df$target %>% unique()
vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()

p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized CAF-ligands","p-EMT genes in malignant cells", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential") + scale_fill_gradient2(low = "whitesmoke",  high = "purple", breaks = c(0,0.005,0.01)) + theme(axis.text.x = element_text(face = "italic"))

p_ligand_target_network

請(qǐng)注意整慎，這些可視化cutoffs的選擇是比較主觀的。我們建議多測(cè)試幾個(gè)cutoffs围苫。

如果根據(jù)先驗(yàn)信息考慮超過(guò)前 250 個(gè)targets裤园，將推斷出更多但不太confident的ligand-target links；通過(guò)考慮少于 250 個(gè)targets剂府，結(jié)果會(huì)更加stringent拧揽。

如果您將用于將分?jǐn)?shù)設(shè)置為 0（出于可視化目的）的分位數(shù)截止值更改，則降低此截止值將導(dǎo)致更密集的熱圖周循，而提高此截止值將導(dǎo)致更稀疏的熱圖强法。

2. Follow-up analysis

2.1 Ligand-receptor network inference for top-ranked ligands

一種類(lèi)型的后續(xù)分析是觀察receiver細(xì)胞群（此處：腫瘤細(xì)胞）的哪些受體可能與來(lái)自sender細(xì)胞群（此處：CAF）的優(yōu)先配體結(jié)合。

因此湾笛，我們現(xiàn)在將推斷出排名靠前的配體的預(yù)測(cè)配體-受體相互作用饮怯，并在熱圖中將它們可視化。

# get the ligand-receptor network of the top-ranked ligands
lr_network_top = lr_network %>% filter(from %in% best_upstream_ligands & to %in% expressed_receptors) %>% distinct(from,to)
best_upstream_receptors = lr_network_top %>% pull(to) %>% unique()

# get the weights of the ligand-receptor interactions as used in the NicheNet model
weighted_networks = readRDS(url("https://zenodo.org/record/3260758/files/weighted_networks.rds"))
lr_network_top_df = weighted_networks$lr_sig %>% filter(from %in% best_upstream_ligands & to %in% best_upstream_receptors)

# convert to a matrix
lr_network_top_df = lr_network_top_df %>% spread("from","weight",fill = 0)
lr_network_top_matrix = lr_network_top_df %>% select(-to) %>% as.matrix() %>% magrittr::set_rownames(lr_network_top_df$to)

# perform hierarchical clustering to order the ligands and receptors
dist_receptors = dist(lr_network_top_matrix, method = "binary")
hclust_receptors = hclust(dist_receptors, method = "ward.D2")
order_receptors = hclust_receptors$labels[hclust_receptors$order]

dist_ligands = dist(lr_network_top_matrix %>% t(), method = "binary")
hclust_ligands = hclust(dist_ligands, method = "ward.D2")
order_ligands_receptor = hclust_ligands$labels[hclust_ligands$order]

Show a heatmap of the ligand-receptor interactions

vis_ligand_receptor_network = lr_network_top_matrix[order_receptors, order_ligands_receptor]
p_ligand_receptor_network = vis_ligand_receptor_network %>% t() %>% make_heatmap_ggplot("Prioritized CAF-ligands","Receptors expressed by malignant cells", color = "mediumvioletred", x_axis_position = "top",legend_title = "Prior interaction potential")
p_ligand_receptor_network

2.2 Visualize expression of top-predicted ligands and their target genes in a combined heatmap

NicheNet 只考慮sender細(xì)胞的表達(dá)配體嚎研，但不考慮它們的表達(dá)來(lái)對(duì)配體進(jìn)行排序蓖墅。該排名純粹基于在給定先驗(yàn)知識(shí)的情況下，配體可能調(diào)節(jié)感興趣的基因集的潛力临扮。因?yàn)檫M(jìn)一步研究配體及其靶基因的表達(dá)也很有用论矾，我們?cè)诖搜菔救绾沃谱黠@示配體活性、配體表達(dá)杆勇、靶基因表達(dá)和配體-靶調(diào)節(jié)潛力的組合圖贪壳。

Load additional packages required for the visualization:

library(RColorBrewer)
library(cowplot)
library(ggpubr)

Prepare the ligand activity matrix

ligand_pearson_matrix = ligand_activities %>% select(pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities$test_ligand)

vis_ligand_pearson = ligand_pearson_matrix[order_ligands, ] %>% as.matrix(ncol = 1) %>% magrittr::set_colnames("Pearson")

p_ligand_pearson = vis_ligand_pearson %>% make_heatmap_ggplot("Prioritized CAF-ligands","Ligand activity", color = "darkorange",legend_position = "top", x_axis_position = "top", legend_title = "Pearson correlation coefficient\ntarget gene prediction ability)")
p_ligand_pearson

Prepare expression of ligands in fibroblast per tumor

因?yàn)閱渭?xì)胞數(shù)據(jù)是從多個(gè)腫瘤中收集的，我們將在這里顯示每個(gè)腫瘤配體的平均表達(dá)蚜退。

expression_df_CAF = expression[CAF_ids,order_ligands] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell")

aggregated_expression_CAF = expression_df_CAF %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)

aggregated_expression_df_CAF = aggregated_expression_CAF %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_CAF$tumor) %>% data.frame() %>% rownames_to_column("ligand") %>% as_tibble() 

aggregated_expression_matrix_CAF = aggregated_expression_df_CAF %>% select(-ligand) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_CAF$ligand)

order_tumors = c("HN6","HN20","HN26","HN28","HN22","HN25","HN5","HN18","HN17","HN16") # this order was determined based on the paper from Puram et al. Tumors are ordered according to p-EMT score.
vis_ligand_tumor_expression = aggregated_expression_matrix_CAF[order_ligands,order_tumors]

library(RColorBrewer)
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(100)
p_ligand_tumor_expression = vis_ligand_tumor_expression %>% make_heatmap_ggplot("Prioritized CAF-ligands","Tumor", color = color[100],legend_position = "top", x_axis_position = "top", legend_title = "Expression\n(averaged over\nsingle cells)") + theme(axis.text.y = element_text(face = "italic"))
p_ligand_tumor_expression

Prepare expression of target genes in malignant cells per tumor

expression_df_target = expression[malignant_ids,geneset_oi] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell") 

aggregated_expression_target = expression_df_target %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)

aggregated_expression_df_target = aggregated_expression_target %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_target$tumor) %>% data.frame() %>% rownames_to_column("target") %>% as_tibble() 

aggregated_expression_matrix_target = aggregated_expression_df_target %>% select(-target) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_target$target)

vis_target_tumor_expression_scaled = aggregated_expression_matrix_target %>% t() %>% scale_quantile() %>% .[order_tumors,order_targets]

p_target_tumor_scaled_expression = vis_target_tumor_expression_scaled  %>% make_threecolor_heatmap_ggplot("Tumor","Target", low_color = color[1],mid_color = color[50], mid = 0.5, high_color = color[100], legend_position = "top", x_axis_position = "top" , legend_title = "Scaled expression\n(averaged over\nsingle cells)") + theme(axis.text.x = element_text(face = "italic"))
p_target_tumor_scaled_expression

都畫(huà)在一起

figures_without_legend = plot_grid(
  p_ligand_pearson + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()),
  p_ligand_tumor_expression + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()) + ylab(""),
  p_ligand_target_network + theme(legend.position = "none", axis.ticks = element_blank()) + ylab(""), 
  NULL,
  NULL,
  p_target_tumor_scaled_expression + theme(legend.position = "none", axis.ticks = element_blank()) + xlab(""), 
  align = "hv",
  nrow = 2,
  rel_widths = c(ncol(vis_ligand_pearson)+ 4.5, ncol(vis_ligand_tumor_expression), ncol(vis_ligand_target)) -2,
  rel_heights = c(nrow(vis_ligand_pearson), nrow(vis_target_tumor_expression_scaled) + 3)) 

legends = plot_grid(
  as_ggplot(get_legend(p_ligand_pearson)),
  as_ggplot(get_legend(p_ligand_tumor_expression)),
  as_ggplot(get_legend(p_ligand_target_network)),
  as_ggplot(get_legend(p_target_tumor_scaled_expression)),
  nrow = 2,
  align = "h")

plot_grid(figures_without_legend, 
          legends, 
          rel_heights = c(10,2), nrow = 2, align = "hv")

2.3 Other follow-up analyses:

As another follow-up analysis, you can infer possible signaling paths between ligands and targets of interest. You can read how to do this in the following vignette Inferring ligand-to-target signaling paths:vignette("ligand_target_signaling_path", package="nichenetr").
Another follow-up analysis is getting a “tangible” measure of how well top-ranked ligands predict the gene set of interest and assess which genes of the gene set can be predicted well. You can read how to do this in the following vignette Assess how well top-ranked ligands can predict a gene set of interest:vignette("target_prediction_evaluation_geneset", package="nichenetr").
In case you want to visualize ligand-target links between multiple interacting cells, you can make an appealing circos plot as shown in vignette Circos plot visualization to show active ligand-target links between interacting cells:vignette("circos", package="nichenetr").

3. 從seurat對(duì)象做分析

3.0 數(shù)據(jù)集準(zhǔn)備

library(nichenetr)
library(tidyverse)

# 配體靶基因信息
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
#讀入seurat對(duì)象闰靴，用的是nichenet多組比較的那個(gè)數(shù)據(jù)集
seuratObj <- readRDS(url("https://zenodo.org/record/4675430/files/seurat_obj_hnscc.rds"))
#得到表達(dá)矩陣和metadata
expression=t(as.matrix(seuratObj@assays$SCT@data))
sample_info=seuratObj@meta.data

3.1 定義sender 和 receiver細(xì)胞群中的 expressed genes

#篩選樣本
CAF_ids = sample_info %>% filter(`non.cancer.cell.type` == "CAF") %>% pull(cell)
malignant_ids = sample_info %>% filter(`classified..as.cancer.cell` == 1) %>% pull(cell)

#10x推薦至少pct = 0.10
receiver = "Malignant"
expressed_genes_receiver = get_expressed_genes(receiver, seuratObj, pct = 0.10)
length(expressed_genes_receiver)
# [1] 9994
sender = "CAF"
expressed_genes_sender = get_expressed_genes(sender, seuratObj, pct = 0.10)
length(expressed_genes_sender)
# [1] 8117

# 多種sender細(xì)胞的情況
# sender_celltypes = c("CD4 T","Treg", "Mono", "NK", "B", "DC")
# list_expressed_genes_sender = sender_celltypes %>% unique() %>% lapply(get_expressed_genes, seuratObj, 0.10) # lapply to get the expressed genes of every sender cell type separately here
# expressed_genes_sender = list_expressed_genes_sender %>% unlist() %>% unique()

3.2 定義interest 和 background基因集

geneset_oi = readr::read_tsv(url("https://zenodo.org/record/3260758/files/pemt_signature.txt"), col_names = "gene") %>% pull(gene) %>% .[. %in% rownames(ligand_target_matrix)] # only consider genes also present in the NicheNet model - this excludes genes from the gene list for which the official HGNC symbol was not used by Puram et al.
head(geneset_oi)
## [1] "SERPINE1" "TGFBI"    "MMP10"    "LAMC2"    "P4HA2"    "PDPN"

background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
head(background_expressed_genes)

3.3 Define a set of potential ligands

lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))

# If wanted, users can remove ligand-receptor interactions that were predicted based on protein-protein interactions and only keep ligand-receptor interactions that are described in curated databases. To do this: uncomment following line of code:
# lr_network = lr_network %>% filter(database != "ppi_prediction_go" & database != "ppi_prediction")

ligands = lr_network %>% pull(from) %>% unique()
expressed_ligands = intersect(ligands,expressed_genes_sender)

receptors = lr_network %>% pull(to) %>% unique()
expressed_receptors = intersect(receptors,expressed_genes_receiver)

lr_network_expressed = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors) 
head(lr_network_expressed)
## A tibble: 6 × 4
#  from  to     source         database
#  <chr> <chr>  <chr>          <chr>   
# 1 IL6   IL6ST  kegg_cytokines kegg    
# 2 IL6   IL6R   kegg_cytokines kegg    
# 3 IL11  IL6ST  kegg_cytokines kegg    
# 4 CLCF1 IL6ST  kegg_cytokines kegg    
# 5 HGF   MET    kegg_cytokines kegg    
# 6 IL10  IL10RB kegg_cytokines kegg

potential_ligands = lr_network_expressed %>% pull(from) %>% unique()
head(potential_ligands)
# [1] "IL6"     "IL11"    "CLCF1"   "HGF"     "IL10"    "TNFSF10"

3.4 Perform NicheNet’s ligand activity analysis on the gene set of interest

ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)
ligand_activities %>% arrange(-pearson) 
best_upstream_ligands = ligand_activities %>% top_n(20, pearson) %>% arrange(-pearson) %>% pull(test_ligand)
head(best_upstream_ligands)
p_hist_lig_activity = ggplot(ligand_activities, aes(x=pearson)) + 
  geom_histogram(color="black", fill="darkorange")  + 
  # geom_density(alpha=.1, fill="orange") +
  geom_vline(aes(xintercept=min(ligand_activities %>% top_n(20, pearson) %>% pull(pearson))), color="red", linetype="dashed", size=1) + 
  labs(x="ligand activity (PCC)", y = "# ligands") +
  theme_classic()
p_hist_lig_activity

3.5 Infer target genes of top-ranked ligands and visualize in a heatmap

active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 250) %>% bind_rows()

nrow(active_ligand_target_links_df)
## [1] 136
head(active_ligand_target_links_df)

active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.25)

nrow(active_ligand_target_links_df)
## [1] 136
head(active_ligand_target_links_df)

order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev()
order_targets = active_ligand_target_links_df$target %>% unique()
vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()

p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized CAF-ligands","p-EMT genes in malignant cells", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential") + scale_fill_gradient2(low = "whitesmoke",  high = "purple", breaks = c(0,0.005,0.01)) + theme(axis.text.x = element_text(face = "italic"))

p_ligand_target_network

3.6 合并繪圖

## Load additional packages required for the visualization
library(RColorBrewer)
library(cowplot)
library(ggpubr)

## Prepare the ligand activity matrix
ligand_pearson_matrix = ligand_activities %>% select(pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities$test_ligand)
vis_ligand_pearson = ligand_pearson_matrix[order_ligands, ] %>% as.matrix(ncol = 1) %>% magrittr::set_colnames("Pearson")
p_ligand_pearson = vis_ligand_pearson %>% make_heatmap_ggplot("Prioritized CAF-ligands","Ligand activity", color = "darkorange",legend_position = "top", x_axis_position = "top", legend_title = "Pearson correlation coefficient\ntarget gene prediction ability)")
# p_ligand_pearson

## Prepare expression of ligands in fibroblast per tumor
expression_df_CAF = expression[CAF_ids,order_ligands] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell")
aggregated_expression_CAF = expression_df_CAF %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_CAF = aggregated_expression_CAF %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_CAF$tumor) %>% data.frame() %>% rownames_to_column("ligand") %>% as_tibble() 
aggregated_expression_matrix_CAF = aggregated_expression_df_CAF %>% select(-ligand) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_CAF$ligand)
order_tumors = c("HN6","HN20","HN26","HN28","HN22","HN25","HN5","HN18","HN17","HN16") # this order was determined based on the paper from Puram et al. Tumors are ordered according to p-EMT score.
vis_ligand_tumor_expression = aggregated_expression_matrix_CAF[order_ligands,order_tumors]
library(RColorBrewer)
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(100)
p_ligand_tumor_expression = vis_ligand_tumor_expression %>% make_heatmap_ggplot("Prioritized CAF-ligands","Tumor", color = color[100],legend_position = "top", x_axis_position = "top", legend_title = "Expression\n(averaged over\nsingle cells)") + theme(axis.text.y = element_text(face = "italic"))
# p_ligand_tumor_expression

## Prepare expression of target genes in malignant cells per tumor
expression_df_target = expression[malignant_ids,geneset_oi] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell") 
aggregated_expression_target = expression_df_target %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_target = aggregated_expression_target %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_target$tumor) %>% data.frame() %>% rownames_to_column("target") %>% as_tibble() 
aggregated_expression_matrix_target = aggregated_expression_df_target %>% select(-target) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_target$target)
vis_target_tumor_expression_scaled = aggregated_expression_matrix_target %>% t() %>% scale_quantile() %>% .[order_tumors,order_targets]
p_target_tumor_scaled_expression = vis_target_tumor_expression_scaled  %>% make_threecolor_heatmap_ggplot("Tumor","Target", low_color = color[1],mid_color = color[50], mid = 0.5, high_color = color[100], legend_position = "top", x_axis_position = "top" , legend_title = "Scaled expression\n(averaged over\nsingle cells)") + theme(axis.text.x = element_text(face = "italic"))
# p_target_tumor_scaled_expression

##都畫(huà)在一起
figures_without_legend = plot_grid(
  p_ligand_pearson + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()),
  p_ligand_tumor_expression + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()) + ylab(""),
  p_ligand_target_network + theme(legend.position = "none", axis.ticks = element_blank()) + ylab(""), 
  NULL,
  NULL,
  p_target_tumor_scaled_expression + theme(legend.position = "none", axis.ticks = element_blank()) + xlab(""), 
  align = "hv",
  nrow = 2,
  rel_widths = c(ncol(vis_ligand_pearson)+ 4.5, ncol(vis_ligand_tumor_expression), ncol(vis_ligand_target)) -2,
  rel_heights = c(nrow(vis_ligand_pearson), nrow(vis_target_tumor_expression_scaled) + 3)) 

legends = plot_grid(
  as_ggplot(get_legend(p_ligand_pearson)),
  as_ggplot(get_legend(p_ligand_tumor_expression)),
  as_ggplot(get_legend(p_ligand_target_network)),
  as_ggplot(get_legend(p_target_tumor_scaled_expression)),
  nrow = 2,
  align = "h")

plot_grid(figures_without_legend, 
          legends, 
          rel_heights = c(10,2), nrow = 2, align = "hv")

最后編輯于：2022.11.02 12:34:45

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個(gè)濱河市钻注，隨后出現(xiàn)的幾起案子蚂且，更是在濱河造成了極大的恐慌，老刑警劉巖幅恋，帶你破解...
沈念sama閱讀 206,013評(píng)論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件杏死，死亡現(xiàn)場(chǎng)離奇詭異，居然都是意外死亡，警方通過(guò)查閱死者的電腦和手機(jī)淑翼，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,205評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門(mén)腐巢，熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)，“玉大人窒舟，你說(shuō)我怎么就攤上這事系忙。” “怎么了惠豺？”我有些...
開(kāi)封第一講書(shū)人閱讀 152,370評(píng)論 0贊 342
道士緝兇錄：失蹤的賣(mài)姜人
文/不壞的土叔我叫張陵银还，是天一觀的道長(zhǎng)。經(jīng)常有香客問(wèn)我洁墙，道長(zhǎng)蛹疯，這世上最難降的妖魔是什么？我笑而不...
開(kāi)封第一講書(shū)人閱讀 55,168評(píng)論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任热监，我火速辦了婚禮捺弦，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘孝扛。我一直安慰自己列吼，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 64,153評(píng)論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來(lái)的
文/花漫我一把揭開(kāi)白布苦始。她就那樣靜靜地躺著寞钥，像睡著了一般。火紅的嫁衣襯著肌膚如雪陌选。梳的紋絲不亂的頭發(fā)上理郑，一...
開(kāi)封第一講書(shū)人閱讀 48,954評(píng)論 1贊 283
城市分裂傳說(shuō)
那天，我揣著相機(jī)與錄音咨油，去河邊找鬼您炉。笑死，一個(gè)胖子當(dāng)著我的面吹牛役电，可吹牛的內(nèi)容都是我干的赚爵。我是一名探鬼主播，決...
沈念sama閱讀 38,271評(píng)論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開(kāi)眼法瑟，長(zhǎng)吁一口氣：“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼囱晴！你這毒婦竟也來(lái)了？” 一聲冷哼從身側(cè)響起瓢谢，我...
開(kāi)封第一講書(shū)人閱讀 36,916評(píng)論 0贊 259
萬(wàn)榮殺人案實(shí)錄
序言：老撾萬(wàn)榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎驮瞧，沒(méi)想到半個(gè)月后氓扛，有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,382評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,877評(píng)論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年采郎，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了千所。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 37,989評(píng)論 1贊 333
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡蒜埋，死狀恐怖淫痰，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情整份，我是刑警寧澤待错，帶...
沈念sama閱讀 33,624評(píng)論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站烈评，受9級(jí)特大地震影響火俄，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜讲冠，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,209評(píng)論 3贊 307
男人毒藥：我在死后第九天來(lái)索命
文/蒙蒙一瓜客、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧竿开，春花似錦谱仪、人聲如沸。這莊子的主人今日做“春日...
開(kāi)封第一講書(shū)人閱讀 30,199評(píng)論 0贊 19
一樁弒父案疯攒，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽(yáng)。三九已至胳搞，卻和暖如春卸例，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背肌毅。一陣腳步聲響...
開(kāi)封第一講書(shū)人閱讀 31,418評(píng)論 1贊 260
情欲美人皮
我被黑心中介騙來(lái)泰國(guó)打工筷转，沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人悬而。一個(gè)月前我還...
沈念sama閱讀 45,401評(píng)論 2贊 352
代替公主和親
正文我出身青樓呜舒，卻偏偏與公主長(zhǎng)得像，于是被迫代替她去往敵國(guó)和親笨奠。傳聞我的和親對(duì)象是個(gè)殘疾皇子袭蝗，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,700評(píng)論 2贊 345