單細(xì)胞分析之細(xì)胞交互-7:目標(biāo)基因集的配體和靶基因活性預(yù)測(cè)分析


常用的細(xì)胞通訊軟件:

  • CellphoneDB:是公開(kāi)的人工校正的清女,儲(chǔ)存受體蚕甥、配體以及兩種相互作用的數(shù)據(jù)庫(kù)躏惋。此外发框,還考慮了結(jié)構(gòu)組成泣懊,能夠描述異構(gòu)復(fù)合物伸辟。(配體-受體+多聚體)
  • iTALK:通過(guò)平均表達(dá)量方式,篩選高表達(dá)的胚體和受體馍刮,根據(jù)結(jié)果作圈圖信夫。(配體-受體)
  • CellChat:CellChat將細(xì)胞的基因表達(dá)數(shù)據(jù)作為輸入,并結(jié)合配體受體及其輔助因子的相互作用來(lái)模擬細(xì)胞間通訊卡啰。(配體-受體+多聚體+輔因子)
  • NicheNet // NicheNet多樣本分析 // 目標(biāo)基因的配體和靶基因活性預(yù)測(cè):通過(guò)將相互作用細(xì)胞的表達(dá)數(shù)據(jù)與信號(hào)和基因調(diào)控網(wǎng)絡(luò)的先驗(yàn)知識(shí)相結(jié)合來(lái)預(yù)測(cè)相互作用細(xì)胞之間的配體-靶標(biāo)聯(lián)系的方法静稻。( 配體-受體+信號(hào)通路)
    附:NicheNet使用的常見(jiàn)問(wèn)題匯總

其它細(xì)胞互作軟件還包括CelltalkerSingleCellSignalR匈辱,scTensorSoptSC(這幾個(gè)也是基于配體-受體相互作用)


Nichenet可以預(yù)測(cè):

  1. which ligands from one cell population (“sender/niche”) are most likely to affect target gene expression in an interacting cell population (“receiver/target”);
  2. which specific target genes are affected by which of these predicted ligands.

1. 演示

1.0 加載R包和數(shù)據(jù)集
# R包
library(nichenetr)
library(tidyverse)

# 配體靶基因信息
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
##                 CXCL1        CXCL2        CXCL3        CXCL5         PPBP
## A1BG     3.534343e-04 4.041324e-04 3.729920e-04 3.080640e-04 2.628388e-04
## A1BG-AS1 1.650894e-04 1.509213e-04 1.583594e-04 1.317253e-04 1.231819e-04
## A1CF     5.787175e-04 4.596295e-04 3.895907e-04 3.293275e-04 3.211944e-04
## A2M      6.027058e-04 5.996617e-04 5.164365e-04 4.517236e-04 4.590521e-04
## A2M-AS1  8.898724e-05 8.243341e-05 7.484018e-05 4.912514e-05 5.120439e-05

##表達(dá)矩陣和metadata
hnscc_expression = readRDS(url("https://zenodo.org/record/3260758/files/hnscc_expression.rds"))
expression = hnscc_expression$expression
sample_info = hnscc_expression$sample_info # contains meta-information about the cells
View(hnscc_expression)
1.1 Define expressed genes in sender and receiver cell populations

我們的目標(biāo)是探究CAFs表達(dá)的什么配體引起了周?chē)[瘤細(xì)胞的p-EMT振湾,所以CAFs是sender細(xì)胞,腫瘤細(xì)胞是receiver細(xì)胞亡脸。(sender 和 receiver 也可以是同一種細(xì)胞類(lèi)型押搪,也就是自分泌)
因?yàn)槲覀兿胍芯康氖莌igh quality primary tumors, 因此less quality的和lymph node metastases的腫瘤樣本將被剔除浅碾。
在這個(gè)數(shù)據(jù)集中大州,為了定義expressed genes,我們采用了Ea, the aggregate expression of each gene i across the k cells, calculated as Ea(i) = log2(average(TPM(i)1…k)+1), should be >= 4. 而10x的數(shù)據(jù)集及穗,我們更推薦genes to be expressed in a cell type when they have non-zero values in at least 10% of the cells from that cell type.

tumors_remove = c("HN10","HN","HN12", "HN13", "HN24", "HN7", "HN8","HN23")

CAF_ids = sample_info %>% filter(`Lymph node` == 0 & !(tumor %in% tumors_remove) & `non-cancer cell type` == "CAF") %>% pull(cell)
malignant_ids = sample_info %>% filter(`Lymph node` == 0 & !(tumor %in% tumors_remove) & `classified  as cancer cell` == 1) %>% pull(cell)

expressed_genes_sender = expression[CAF_ids,] %>% apply(2,function(x){10*(2**x - 1)}) %>% apply(2,function(x){log2(mean(x) + 1)}) %>% .[. >= 4] %>% names()
expressed_genes_receiver = expression[malignant_ids,] %>% apply(2,function(x){10*(2**x - 1)}) %>% apply(2,function(x){log2(mean(x) + 1)}) %>% .[. >= 4] %>% names()

# Check the number of expressed genes: should be a 'reasonable' number of total expressed genes in a cell type, e.g. between 5000-10000 (and not 500 or 20000)
length(expressed_genes_sender)
## [1] 6706
length(expressed_genes_receiver)
## [1] 6351
1.2 Define the gene set of interest and a background of genes

我們使用定義好的p-EMT基因集作為interest gene set摧茴,用腫瘤細(xì)胞表達(dá)的基因作為background.

geneset_oi = readr::read_tsv(url("https://zenodo.org/record/3260758/files/pemt_signature.txt"), col_names = "gene") %>% pull(gene) %>% .[. %in% rownames(ligand_target_matrix)] # only consider genes also present in the NicheNet model - this excludes genes from the gene list for which the official HGNC symbol was not used by Puram et al.
head(geneset_oi)
## [1] "SERPINE1" "TGFBI"    "MMP10"    "LAMC2"    "P4HA2"    "PDPN"

background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
head(background_expressed_genes)
## [1] "RPS11"   "ELMO2"   "PNMA1"   "MMP2"    "TMEM216" "ERCC5"
1.3 Define a set of potential ligands

作為潛在的活性配體,我們將使用 1) 由 CAF 表達(dá)和 2) 可以結(jié)合惡性細(xì)胞表達(dá)的(putative)受體的配體埂陆。 假定的配體-受體links是從 NicheNet 的ligand-receptor data sources收集的苛白。

lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))

# If wanted, users can remove ligand-receptor interactions that were predicted based on protein-protein interactions and only keep ligand-receptor interactions that are described in curated databases. To do this: uncomment following line of code:
# lr_network = lr_network %>% filter(database != "ppi_prediction_go" & database != "ppi_prediction")

ligands = lr_network %>% pull(from) %>% unique()
expressed_ligands = intersect(ligands,expressed_genes_sender)

receptors = lr_network %>% pull(to) %>% unique()
expressed_receptors = intersect(receptors,expressed_genes_receiver)

lr_network_expressed = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors) 
head(lr_network_expressed)
## # A tibble: 6 x 4
##   from    to        source         database
##   <chr>   <chr>     <chr>          <chr>   
## 1 HGF     MET       kegg_cytokines kegg    
## 2 TNFSF10 TNFRSF10A kegg_cytokines kegg    
## 3 TNFSF10 TNFRSF10B kegg_cytokines kegg    
## 4 TGFB2   TGFBR1    kegg_cytokines kegg    
## 5 TGFB3   TGFBR1    kegg_cytokines kegg    
## 6 INHBA   ACVR2A    kegg_cytokines kegg

配體-受體網(wǎng)絡(luò)包含表達(dá)的配體-受體相互作用娃豹。 作為 NicheNet 分析的潛在活性配體,我們將考慮來(lái)自該網(wǎng)絡(luò)的配體购裙。

potential_ligands = lr_network_expressed %>% pull(from) %>% unique()
head(potential_ligands)
## [1] "HGF"     "TNFSF10" "TGFB2"   "TGFB3"   "INHBA"   "CD99"
1.4 Perform NicheNet’s ligand activity analysis on the gene set of interest

現(xiàn)在進(jìn)行配體活性分析:在此分析中懂版,我們將計(jì)算每個(gè)配體的配體活性,或者換句話說(shuō)躏率,我們將評(píng)估每個(gè) CAF 配體 (和背景基因相比) 預(yù)測(cè) p-EMT 基因的能力 (預(yù)測(cè)一個(gè)基因是否屬于 p-EMT program)躯畴。

ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)

現(xiàn)在,我們要根據(jù)配體活性對(duì)配體進(jìn)行排名薇芝。 在我們的validation study中蓬抄,我們發(fā)現(xiàn)配體的靶基因預(yù)測(cè)與觀察到的轉(zhuǎn)錄反應(yīng)之間的pearson 相關(guān)系數(shù) (PCC) 是定義配體活性的最有用的測(cè)量方法。 因此夯到,我們將根據(jù)配體的PCC對(duì)配體進(jìn)行排名嚷缭。 This allows us to prioritize p-EMT-regulating ligands.

ligand_activities %>% arrange(-pearson) 
## # A tibble: 131 x 4
##    test_ligand auroc   aupr pearson
##    <chr>       <dbl>  <dbl>   <dbl>
##  1 PTHLH       0.667 0.0720   0.128
##  2 CXCL12      0.680 0.0507   0.123
##  3 AGT         0.676 0.0581   0.120
##  4 TGFB3       0.689 0.0454   0.117
##  5 IL6         0.693 0.0510   0.115
##  6 INHBA       0.695 0.0502   0.113
##  7 ADAM17      0.672 0.0526   0.113
##  8 TNC         0.700 0.0444   0.109
##  9 CTGF        0.680 0.0473   0.108
## 10 FN1         0.679 0.0505   0.108
## # ... with 121 more rows
best_upstream_ligands = ligand_activities %>% top_n(20, pearson) %>% arrange(-pearson) %>% pull(test_ligand)
head(best_upstream_ligands)
## [1] "PTHLH"  "CXCL12" "AGT"    "TGFB3"  "IL6"    "INHBA"

我們?cè)谶@里看到,performance metrics 表明 20 個(gè)排名靠前的配體可以合理地預(yù)測(cè) p-EMT 基因耍贾,這意味著配體的排名應(yīng)該是準(zhǔn)確的阅爽。 然而,對(duì)于某些基因集荐开,排名靠前的配體的目標(biāo)基因預(yù)測(cè)性能可能不會(huì)比隨機(jī)預(yù)測(cè)好多少付翁。 在這種情況下,配體的優(yōu)先級(jí)將不太可信晃听。

Additional note:我們?cè)谶@里查看了前 20 個(gè)配體百侧,并將通過(guò)推斷這 20 個(gè)配體的 p-EMT 靶基因來(lái)繼續(xù)分析。 然而能扒,選擇僅查看排名靠前的 20 個(gè)配體以進(jìn)行進(jìn)一步的生物學(xué)解釋是基于生物學(xué)直覺(jué)并且是相當(dāng)隨意的移层。 因此,用戶(hù)可以決定使用不同數(shù)量的配體繼續(xù)分析赫粥。 我們建議通過(guò)查看配體活性值的分布來(lái)檢查選定的截止值。 在這里予借,我們顯示了配體活性直方圖(第 20 個(gè)配體的分?jǐn)?shù)通過(guò)虛線表示)越平。

# show histogram of ligand activity scores
p_hist_lig_activity = ggplot(ligand_activities, aes(x=pearson)) + 
  geom_histogram(color="black", fill="darkorange")  + 
  # geom_density(alpha=.1, fill="orange") +
  geom_vline(aes(xintercept=min(ligand_activities %>% top_n(20, pearson) %>% pull(pearson))), color="red", linetype="dashed", size=1) + 
  labs(x="ligand activity (PCC)", y = "# ligands") +
  theme_classic()
p_hist_lig_activity
1.5 Infer target genes of top-ranked ligands and visualize in a heatmap

現(xiàn)在我們將展示如何查看配體和感興趣的靶基因之間的調(diào)節(jié)潛力評(píng)分。 在這種情況下灵迫,我們將研究排名靠前的 p-EMT 調(diào)節(jié)配體和 p-EMT 基因之間的聯(lián)系秦叛。 在配體-靶標(biāo)熱圖中,我們展示了 20 個(gè)排名靠前的配體與以下靶基因之間相互作用的調(diào)節(jié)潛力評(píng)分:屬于感興趣基因組的基因和 20 個(gè)排名靠前的配體中至少一個(gè)的 250 個(gè)最強(qiáng)烈預(yù)測(cè)的靶標(biāo)(根據(jù)一般先驗(yàn)?zāi)P偷那?250 個(gè)靶標(biāo)瀑粥,因此不是該數(shù)據(jù)集的前 250 個(gè)靶標(biāo))挣跋。 因此,基因集中不是優(yōu)先配體之一的top靶基因的基因?qū)⒉粫?huì)顯示在熱圖上狞换。

active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 250) %>% bind_rows()

nrow(active_ligand_target_links_df)
## [1] 143
head(active_ligand_target_links_df)
## # A tibble: 6 x 3
##   ligand target  weight
##   <chr>  <chr>    <dbl>
## 1 PTHLH  COL1A1 0.00399
## 2 PTHLH  MMP1   0.00425
## 3 PTHLH  MMP2   0.00210
## 4 PTHLH  MYH9   0.00116
## 5 PTHLH  P4HA2  0.00190
## 6 PTHLH  PLAU   0.00401

出于可視化目的避咆,我們按照如下方法調(diào)整了配體-靶標(biāo)regulatory potential matrix舟肉。 如果它們的分?jǐn)?shù)低于預(yù)定義的閾值,則將調(diào)節(jié)潛力評(píng)分設(shè)置為 0查库,在這里使用的是 20 個(gè)排名靠前的配體與其各自的頂級(jí)目標(biāo)之間的相互作用分?jǐn)?shù)的 0.25 分位數(shù)(see the ligand-target network defined in the data frame)路媚。

active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.25)

nrow(active_ligand_target_links_df)
## [1] 143
head(active_ligand_target_links_df)
## # A tibble: 6 x 3
##   ligand target  weight
##   <chr>  <chr>    <dbl>
## 1 PTHLH  COL1A1 0.00399
## 2 PTHLH  MMP1   0.00425
## 3 PTHLH  MMP2   0.00210
## 4 PTHLH  MYH9   0.00116
## 5 PTHLH  P4HA2  0.00190
## 6 PTHLH  PLAU   0.00401

我們使用熱圖來(lái)對(duì)putatively活性配體-目標(biāo)links進(jìn)行可視化。 配體的順序與根據(jù)配體活性預(yù)測(cè)的排序一致樊销。

order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev()
order_targets = active_ligand_target_links_df$target %>% unique()
vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()

p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized CAF-ligands","p-EMT genes in malignant cells", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential") + scale_fill_gradient2(low = "whitesmoke",  high = "purple", breaks = c(0,0.005,0.01)) + theme(axis.text.x = element_text(face = "italic"))

p_ligand_target_network

請(qǐng)注意整慎,這些可視化cutoffs的選擇是比較主觀的。 我們建議多測(cè)試幾個(gè)cutoffs围苫。

如果根據(jù)先驗(yàn)信息考慮超過(guò)前 250 個(gè)targets裤园,將推斷出更多但不太confident的ligand-target links; 通過(guò)考慮少于 250 個(gè)targets剂府,結(jié)果會(huì)更加stringent拧揽。

如果您將用于將分?jǐn)?shù)設(shè)置為 0(出于可視化目的)的分位數(shù)截止值更改,則降低此截止值將導(dǎo)致更密集的熱圖周循,而提高此截止值將導(dǎo)致更稀疏的熱圖强法。

2. Follow-up analysis

2.1 Ligand-receptor network inference for top-ranked ligands

一種類(lèi)型的后續(xù)分析是觀察receiver細(xì)胞群(此處:腫瘤細(xì)胞)的哪些受體可能與來(lái)自sender細(xì)胞群(此處:CAF)的優(yōu)先配體結(jié)合。

因此湾笛,我們現(xiàn)在將推斷出排名靠前的配體的預(yù)測(cè)配體-受體相互作用饮怯,并在熱圖中將它們可視化。

# get the ligand-receptor network of the top-ranked ligands
lr_network_top = lr_network %>% filter(from %in% best_upstream_ligands & to %in% expressed_receptors) %>% distinct(from,to)
best_upstream_receptors = lr_network_top %>% pull(to) %>% unique()

# get the weights of the ligand-receptor interactions as used in the NicheNet model
weighted_networks = readRDS(url("https://zenodo.org/record/3260758/files/weighted_networks.rds"))
lr_network_top_df = weighted_networks$lr_sig %>% filter(from %in% best_upstream_ligands & to %in% best_upstream_receptors)

# convert to a matrix
lr_network_top_df = lr_network_top_df %>% spread("from","weight",fill = 0)
lr_network_top_matrix = lr_network_top_df %>% select(-to) %>% as.matrix() %>% magrittr::set_rownames(lr_network_top_df$to)

# perform hierarchical clustering to order the ligands and receptors
dist_receptors = dist(lr_network_top_matrix, method = "binary")
hclust_receptors = hclust(dist_receptors, method = "ward.D2")
order_receptors = hclust_receptors$labels[hclust_receptors$order]

dist_ligands = dist(lr_network_top_matrix %>% t(), method = "binary")
hclust_ligands = hclust(dist_ligands, method = "ward.D2")
order_ligands_receptor = hclust_ligands$labels[hclust_ligands$order]

Show a heatmap of the ligand-receptor interactions

vis_ligand_receptor_network = lr_network_top_matrix[order_receptors, order_ligands_receptor]
p_ligand_receptor_network = vis_ligand_receptor_network %>% t() %>% make_heatmap_ggplot("Prioritized CAF-ligands","Receptors expressed by malignant cells", color = "mediumvioletred", x_axis_position = "top",legend_title = "Prior interaction potential")
p_ligand_receptor_network
2.2 Visualize expression of top-predicted ligands and their target genes in a combined heatmap

NicheNet 只考慮sender細(xì)胞的表達(dá)配體嚎研,但不考慮它們的表達(dá)來(lái)對(duì)配體進(jìn)行排序蓖墅。 該排名純粹基于在給定先驗(yàn)知識(shí)的情況下,配體可能調(diào)節(jié)感興趣的基因集的潛力临扮。 因?yàn)檫M(jìn)一步研究配體及其靶基因的表達(dá)也很有用论矾,我們?cè)诖搜菔救绾沃谱黠@示配體活性、配體表達(dá)杆勇、靶基因表達(dá)和配體-靶調(diào)節(jié)潛力的組合圖贪壳。

Load additional packages required for the visualization:
library(RColorBrewer)
library(cowplot)
library(ggpubr)
Prepare the ligand activity matrix
ligand_pearson_matrix = ligand_activities %>% select(pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities$test_ligand)

vis_ligand_pearson = ligand_pearson_matrix[order_ligands, ] %>% as.matrix(ncol = 1) %>% magrittr::set_colnames("Pearson")
p_ligand_pearson = vis_ligand_pearson %>% make_heatmap_ggplot("Prioritized CAF-ligands","Ligand activity", color = "darkorange",legend_position = "top", x_axis_position = "top", legend_title = "Pearson correlation coefficient\ntarget gene prediction ability)")
p_ligand_pearson
Prepare expression of ligands in fibroblast per tumor

因?yàn)閱渭?xì)胞數(shù)據(jù)是從多個(gè)腫瘤中收集的,我們將在這里顯示每個(gè)腫瘤配體的平均表達(dá)蚜退。

expression_df_CAF = expression[CAF_ids,order_ligands] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell")

aggregated_expression_CAF = expression_df_CAF %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)

aggregated_expression_df_CAF = aggregated_expression_CAF %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_CAF$tumor) %>% data.frame() %>% rownames_to_column("ligand") %>% as_tibble() 

aggregated_expression_matrix_CAF = aggregated_expression_df_CAF %>% select(-ligand) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_CAF$ligand)

order_tumors = c("HN6","HN20","HN26","HN28","HN22","HN25","HN5","HN18","HN17","HN16") # this order was determined based on the paper from Puram et al. Tumors are ordered according to p-EMT score.
vis_ligand_tumor_expression = aggregated_expression_matrix_CAF[order_ligands,order_tumors]
library(RColorBrewer)
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(100)
p_ligand_tumor_expression = vis_ligand_tumor_expression %>% make_heatmap_ggplot("Prioritized CAF-ligands","Tumor", color = color[100],legend_position = "top", x_axis_position = "top", legend_title = "Expression\n(averaged over\nsingle cells)") + theme(axis.text.y = element_text(face = "italic"))
p_ligand_tumor_expression
Prepare expression of target genes in malignant cells per tumor
expression_df_target = expression[malignant_ids,geneset_oi] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell") 

aggregated_expression_target = expression_df_target %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)

aggregated_expression_df_target = aggregated_expression_target %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_target$tumor) %>% data.frame() %>% rownames_to_column("target") %>% as_tibble() 

aggregated_expression_matrix_target = aggregated_expression_df_target %>% select(-target) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_target$target)

vis_target_tumor_expression_scaled = aggregated_expression_matrix_target %>% t() %>% scale_quantile() %>% .[order_tumors,order_targets]
p_target_tumor_scaled_expression = vis_target_tumor_expression_scaled  %>% make_threecolor_heatmap_ggplot("Tumor","Target", low_color = color[1],mid_color = color[50], mid = 0.5, high_color = color[100], legend_position = "top", x_axis_position = "top" , legend_title = "Scaled expression\n(averaged over\nsingle cells)") + theme(axis.text.x = element_text(face = "italic"))
p_target_tumor_scaled_expression
都畫(huà)在一起
figures_without_legend = plot_grid(
  p_ligand_pearson + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()),
  p_ligand_tumor_expression + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()) + ylab(""),
  p_ligand_target_network + theme(legend.position = "none", axis.ticks = element_blank()) + ylab(""), 
  NULL,
  NULL,
  p_target_tumor_scaled_expression + theme(legend.position = "none", axis.ticks = element_blank()) + xlab(""), 
  align = "hv",
  nrow = 2,
  rel_widths = c(ncol(vis_ligand_pearson)+ 4.5, ncol(vis_ligand_tumor_expression), ncol(vis_ligand_target)) -2,
  rel_heights = c(nrow(vis_ligand_pearson), nrow(vis_target_tumor_expression_scaled) + 3)) 

legends = plot_grid(
  as_ggplot(get_legend(p_ligand_pearson)),
  as_ggplot(get_legend(p_ligand_tumor_expression)),
  as_ggplot(get_legend(p_ligand_target_network)),
  as_ggplot(get_legend(p_target_tumor_scaled_expression)),
  nrow = 2,
  align = "h")

plot_grid(figures_without_legend, 
          legends, 
          rel_heights = c(10,2), nrow = 2, align = "hv")
2.3 Other follow-up analyses:

3. 從seurat對(duì)象做分析

3.0 數(shù)據(jù)集準(zhǔn)備
library(nichenetr)
library(tidyverse)

# 配體靶基因信息
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
#讀入seurat對(duì)象闰靴,用的是nichenet多組比較的那個(gè)數(shù)據(jù)集
seuratObj <- readRDS(url("https://zenodo.org/record/4675430/files/seurat_obj_hnscc.rds"))
#得到表達(dá)矩陣和metadata
expression=t(as.matrix(seuratObj@assays$SCT@data))
sample_info=seuratObj@meta.data
3.1 定義sender 和 receiver細(xì)胞群中的 expressed genes
#篩選樣本
CAF_ids = sample_info %>% filter(`non.cancer.cell.type` == "CAF") %>% pull(cell)
malignant_ids = sample_info %>% filter(`classified..as.cancer.cell` == 1) %>% pull(cell)

#10x推薦至少pct = 0.10
receiver = "Malignant"
expressed_genes_receiver = get_expressed_genes(receiver, seuratObj, pct = 0.10)
length(expressed_genes_receiver)
# [1] 9994
sender = "CAF"
expressed_genes_sender = get_expressed_genes(sender, seuratObj, pct = 0.10)
length(expressed_genes_sender)
# [1] 8117

# 多種sender細(xì)胞的情況
# sender_celltypes = c("CD4 T","Treg", "Mono", "NK", "B", "DC")
# list_expressed_genes_sender = sender_celltypes %>% unique() %>% lapply(get_expressed_genes, seuratObj, 0.10) # lapply to get the expressed genes of every sender cell type separately here
# expressed_genes_sender = list_expressed_genes_sender %>% unlist() %>% unique()
3.2 定義interest 和 background基因集
geneset_oi = readr::read_tsv(url("https://zenodo.org/record/3260758/files/pemt_signature.txt"), col_names = "gene") %>% pull(gene) %>% .[. %in% rownames(ligand_target_matrix)] # only consider genes also present in the NicheNet model - this excludes genes from the gene list for which the official HGNC symbol was not used by Puram et al.
head(geneset_oi)
## [1] "SERPINE1" "TGFBI"    "MMP10"    "LAMC2"    "P4HA2"    "PDPN"

background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
head(background_expressed_genes)
3.3 Define a set of potential ligands
lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))

# If wanted, users can remove ligand-receptor interactions that were predicted based on protein-protein interactions and only keep ligand-receptor interactions that are described in curated databases. To do this: uncomment following line of code:
# lr_network = lr_network %>% filter(database != "ppi_prediction_go" & database != "ppi_prediction")

ligands = lr_network %>% pull(from) %>% unique()
expressed_ligands = intersect(ligands,expressed_genes_sender)

receptors = lr_network %>% pull(to) %>% unique()
expressed_receptors = intersect(receptors,expressed_genes_receiver)

lr_network_expressed = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors) 
head(lr_network_expressed)
## A tibble: 6 × 4
#  from  to     source         database
#  <chr> <chr>  <chr>          <chr>   
# 1 IL6   IL6ST  kegg_cytokines kegg    
# 2 IL6   IL6R   kegg_cytokines kegg    
# 3 IL11  IL6ST  kegg_cytokines kegg    
# 4 CLCF1 IL6ST  kegg_cytokines kegg    
# 5 HGF   MET    kegg_cytokines kegg    
# 6 IL10  IL10RB kegg_cytokines kegg 
potential_ligands = lr_network_expressed %>% pull(from) %>% unique()
head(potential_ligands)
# [1] "IL6"     "IL11"    "CLCF1"   "HGF"     "IL10"    "TNFSF10"
3.4 Perform NicheNet’s ligand activity analysis on the gene set of interest
ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)
ligand_activities %>% arrange(-pearson) 
best_upstream_ligands = ligand_activities %>% top_n(20, pearson) %>% arrange(-pearson) %>% pull(test_ligand)
head(best_upstream_ligands)
p_hist_lig_activity = ggplot(ligand_activities, aes(x=pearson)) + 
  geom_histogram(color="black", fill="darkorange")  + 
  # geom_density(alpha=.1, fill="orange") +
  geom_vline(aes(xintercept=min(ligand_activities %>% top_n(20, pearson) %>% pull(pearson))), color="red", linetype="dashed", size=1) + 
  labs(x="ligand activity (PCC)", y = "# ligands") +
  theme_classic()
p_hist_lig_activity
3.5 Infer target genes of top-ranked ligands and visualize in a heatmap
active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 250) %>% bind_rows()

nrow(active_ligand_target_links_df)
## [1] 136
head(active_ligand_target_links_df)

active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.25)

nrow(active_ligand_target_links_df)
## [1] 136
head(active_ligand_target_links_df)

order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev()
order_targets = active_ligand_target_links_df$target %>% unique()
vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()

p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized CAF-ligands","p-EMT genes in malignant cells", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential") + scale_fill_gradient2(low = "whitesmoke",  high = "purple", breaks = c(0,0.005,0.01)) + theme(axis.text.x = element_text(face = "italic"))

p_ligand_target_network
3.6 合并繪圖
## Load additional packages required for the visualization
library(RColorBrewer)
library(cowplot)
library(ggpubr)

## Prepare the ligand activity matrix
ligand_pearson_matrix = ligand_activities %>% select(pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities$test_ligand)
vis_ligand_pearson = ligand_pearson_matrix[order_ligands, ] %>% as.matrix(ncol = 1) %>% magrittr::set_colnames("Pearson")
p_ligand_pearson = vis_ligand_pearson %>% make_heatmap_ggplot("Prioritized CAF-ligands","Ligand activity", color = "darkorange",legend_position = "top", x_axis_position = "top", legend_title = "Pearson correlation coefficient\ntarget gene prediction ability)")
# p_ligand_pearson

## Prepare expression of ligands in fibroblast per tumor
expression_df_CAF = expression[CAF_ids,order_ligands] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell")
aggregated_expression_CAF = expression_df_CAF %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_CAF = aggregated_expression_CAF %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_CAF$tumor) %>% data.frame() %>% rownames_to_column("ligand") %>% as_tibble() 
aggregated_expression_matrix_CAF = aggregated_expression_df_CAF %>% select(-ligand) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_CAF$ligand)
order_tumors = c("HN6","HN20","HN26","HN28","HN22","HN25","HN5","HN18","HN17","HN16") # this order was determined based on the paper from Puram et al. Tumors are ordered according to p-EMT score.
vis_ligand_tumor_expression = aggregated_expression_matrix_CAF[order_ligands,order_tumors]
library(RColorBrewer)
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(100)
p_ligand_tumor_expression = vis_ligand_tumor_expression %>% make_heatmap_ggplot("Prioritized CAF-ligands","Tumor", color = color[100],legend_position = "top", x_axis_position = "top", legend_title = "Expression\n(averaged over\nsingle cells)") + theme(axis.text.y = element_text(face = "italic"))
# p_ligand_tumor_expression

## Prepare expression of target genes in malignant cells per tumor
expression_df_target = expression[malignant_ids,geneset_oi] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell") 
aggregated_expression_target = expression_df_target %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_target = aggregated_expression_target %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_target$tumor) %>% data.frame() %>% rownames_to_column("target") %>% as_tibble() 
aggregated_expression_matrix_target = aggregated_expression_df_target %>% select(-target) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_target$target)
vis_target_tumor_expression_scaled = aggregated_expression_matrix_target %>% t() %>% scale_quantile() %>% .[order_tumors,order_targets]
p_target_tumor_scaled_expression = vis_target_tumor_expression_scaled  %>% make_threecolor_heatmap_ggplot("Tumor","Target", low_color = color[1],mid_color = color[50], mid = 0.5, high_color = color[100], legend_position = "top", x_axis_position = "top" , legend_title = "Scaled expression\n(averaged over\nsingle cells)") + theme(axis.text.x = element_text(face = "italic"))
# p_target_tumor_scaled_expression

##都畫(huà)在一起
figures_without_legend = plot_grid(
  p_ligand_pearson + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()),
  p_ligand_tumor_expression + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()) + ylab(""),
  p_ligand_target_network + theme(legend.position = "none", axis.ticks = element_blank()) + ylab(""), 
  NULL,
  NULL,
  p_target_tumor_scaled_expression + theme(legend.position = "none", axis.ticks = element_blank()) + xlab(""), 
  align = "hv",
  nrow = 2,
  rel_widths = c(ncol(vis_ligand_pearson)+ 4.5, ncol(vis_ligand_tumor_expression), ncol(vis_ligand_target)) -2,
  rel_heights = c(nrow(vis_ligand_pearson), nrow(vis_target_tumor_expression_scaled) + 3)) 

legends = plot_grid(
  as_ggplot(get_legend(p_ligand_pearson)),
  as_ggplot(get_legend(p_ligand_tumor_expression)),
  as_ggplot(get_legend(p_ligand_target_network)),
  as_ggplot(get_legend(p_target_tumor_scaled_expression)),
  nrow = 2,
  align = "h")

plot_grid(figures_without_legend, 
          legends, 
          rel_heights = c(10,2), nrow = 2, align = "hv")
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市钻注,隨后出現(xiàn)的幾起案子蚂且,更是在濱河造成了極大的恐慌,老刑警劉巖幅恋,帶你破解...
    沈念sama閱讀 206,013評(píng)論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件杏死,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī)淑翼,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評(píng)論 2 382
  • 文/潘曉璐 我一進(jìn)店門(mén)腐巢,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái),“玉大人窒舟,你說(shuō)我怎么就攤上這事系忙。” “怎么了惠豺?”我有些...
    開(kāi)封第一講書(shū)人閱讀 152,370評(píng)論 0 342
  • 文/不壞的土叔 我叫張陵银还,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我洁墙,道長(zhǎng)蛹疯,這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 55,168評(píng)論 1 278
  • 正文 為了忘掉前任热监,我火速辦了婚禮捺弦,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘孝扛。我一直安慰自己列吼,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,153評(píng)論 5 371
  • 文/花漫 我一把揭開(kāi)白布苦始。 她就那樣靜靜地躺著寞钥,像睡著了一般。 火紅的嫁衣襯著肌膚如雪陌选。 梳的紋絲不亂的頭發(fā)上理郑,一...
    開(kāi)封第一講書(shū)人閱讀 48,954評(píng)論 1 283
  • 那天,我揣著相機(jī)與錄音咨油,去河邊找鬼您炉。 笑死,一個(gè)胖子當(dāng)著我的面吹牛役电,可吹牛的內(nèi)容都是我干的赚爵。 我是一名探鬼主播,決...
    沈念sama閱讀 38,271評(píng)論 3 399
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼法瑟,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼囱晴!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起瓢谢,我...
    開(kāi)封第一講書(shū)人閱讀 36,916評(píng)論 0 259
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎驮瞧,沒(méi)想到半個(gè)月后氓扛,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,382評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,877評(píng)論 2 323
  • 正文 我和宋清朗相戀三年采郎,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了千所。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 37,989評(píng)論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡蒜埋,死狀恐怖淫痰,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情整份,我是刑警寧澤待错,帶...
    沈念sama閱讀 33,624評(píng)論 4 322
  • 正文 年R本政府宣布,位于F島的核電站烈评,受9級(jí)特大地震影響火俄,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜讲冠,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,209評(píng)論 3 307
  • 文/蒙蒙 一瓜客、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧竿开,春花似錦谱仪、人聲如沸。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 30,199評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至胳搞,卻和暖如春卸例,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背肌毅。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 31,418評(píng)論 1 260
  • 我被黑心中介騙來(lái)泰國(guó)打工筷转, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人悬而。 一個(gè)月前我還...
    沈念sama閱讀 45,401評(píng)論 2 352
  • 正文 我出身青樓呜舒,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親笨奠。 傳聞我的和親對(duì)象是個(gè)殘疾皇子袭蝗,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,700評(píng)論 2 345

推薦閱讀更多精彩內(nèi)容