hello粹断,今天給大家分享一個(gè)很好的可視化軟件构回,使用基于樹(shù)的相關(guān)屏幕來(lái)分析和可視化表型和轉(zhuǎn)錄組特征以及多個(gè)細(xì)胞類(lèi)型分辨率級(jí)別的細(xì)胞類(lèi)型之間的關(guān)聯(lián)晤愧。文章在Tree-based Correlation Screen and Visualization for Exploring Phenotype-Cell Type Association in Multiple Sample Single-Cell RNA-Sequencing Experiments绍昂,可視化的部分非常nice羔飞,下面展示幾張效果圖。
好了挎袜,開(kāi)始我們的分享
Overview
多樣本的單細(xì)胞 RNA-seq 實(shí)驗(yàn)越來(lái)越多地用于發(fā)現(xiàn)可能影響樣本表型(例如疾餐缒簟)的細(xì)胞類(lèi)型及其分子特征肥惭。 然而,分析和可視化復(fù)雜的細(xì)胞類(lèi)型-表型關(guān)聯(lián)仍然很重要紊搪。 TreeCorTreat 是一個(gè)開(kāi)源 R 包蜜葱,它通過(guò)使用基于樹(shù)的相關(guān)屏幕來(lái)分析和可視化表型和轉(zhuǎn)錄組特征以及多個(gè)細(xì)胞類(lèi)型分辨率級(jí)別的細(xì)胞類(lèi)型之間的關(guān)聯(lián)來(lái)解決這個(gè)問(wèn)題。 使用 TreeCorTreat耀石,可以方便地探索和比較不同的特征類(lèi)型牵囤、表型特征、分析協(xié)議和數(shù)據(jù)集娶牌,并評(píng)估潛在混雜因素的影響奔浅。
TreeCorTreat 將基因表達(dá)矩陣(原始計(jì)數(shù))馆纳、細(xì)胞meta數(shù)據(jù)和樣本meta數(shù)據(jù)作為輸入诗良。 它提供了一個(gè)完整的pipeline來(lái)整合樣本之間的數(shù)據(jù),識(shí)別細(xì)胞cluster及其層次結(jié)構(gòu)鲁驶,在細(xì)胞類(lèi)型比例和基因表達(dá)方面評(píng)估不同分辨率級(jí)別的樣本表型和細(xì)胞類(lèi)型之間的關(guān)聯(lián)鉴裹,并總結(jié)和可視化結(jié)果樹(shù)結(jié)構(gòu)的TreeCorTreat 圖。 該pipeline由六個(gè)功能模塊組成:
- Module 1: Data integration
- Module 2: Define cell types at multiple resolutions
- Module 3: Identify association between cell type proportion and sample phenotype
- Module 4: Identify association between global gene expression and sample phenotype
- Module 5: Identify differentially expressed genes
- Module 6: Visualization via TreeCorTreat plot
模塊化結(jié)構(gòu)使用可以靈活地跳過(guò)某些分析步驟钥弯,并用自己的數(shù)據(jù)或分析功能代替它們径荔。
Input data and data preparation
TreeCorTreat 的輸入由三個(gè)部分組成:基因表達(dá)矩陣 E(G×C,行代表 G 基因脆霎,列代表 C 細(xì)胞)总处、細(xì)胞級(jí)meta數(shù)據(jù) M(C×K,其中 K ≥ 2)和樣本級(jí)meta數(shù)據(jù) L(S×J睛蛛,其中 J ≥ 2)鹦马。為了避免冗余并節(jié)省存儲(chǔ)空間,我們將meta數(shù)據(jù)拆分為細(xì)胞級(jí)meta數(shù)據(jù)和樣本級(jí)meta數(shù)據(jù)忆肾。細(xì)胞級(jí)meta數(shù)據(jù)主要注釋細(xì)胞級(jí)信息荸频,每個(gè)細(xì)胞必須至少包含 2 列:細(xì)胞條形碼和相應(yīng)的樣本標(biāo)識(shí)符 (ID)】透裕可以添加可選的第三列以提供單元格類(lèi)型注釋旭从。細(xì)胞條形碼用于耦合細(xì)胞級(jí)meta數(shù)據(jù) M 和基因表達(dá)矩陣 E。樣本級(jí)meta數(shù)據(jù)記錄樣本的感興趣表型(例如臨床結(jié)果)和其他相關(guān)協(xié)變量(例如年齡和性別)场仲。第一列包含唯一的樣本 ID和悦,其余列包含表型和協(xié)變量。細(xì)胞級(jí)meta數(shù)據(jù)和樣本級(jí)meta數(shù)據(jù)可以通過(guò)唯一的樣本 ID 鏈接渠缕。
# load in TreeCorTreat
options(warn=-1)
suppressMessages(library(TreeCorTreat))
suppressMessages(library(ggplot2))
suppressMessages(library(dplyr))
suppressMessages(library(tidyr))
data(raw_data)
str(raw_data)
Module 1: Data integration
Harmony 算法用于整合來(lái)自不同樣本的細(xì)胞并將它們嵌入到一個(gè)共同的低維空間中摹闽。 重新編寫(xiě)了 Seurat v3 中的 RunHarmony 和 BuildClusterTree 函數(shù),并將以下步驟包裝到一個(gè)具有可調(diào)參數(shù)的獨(dú)立 treecor_harmony 函數(shù)中:庫(kù)大小歸一化褐健、集成特征選擇付鹿、主成分分析 (PCA)澜汤、Harmony 集成、無(wú)監(jiān)督 louvain 聚類(lèi)舵匾、統(tǒng)一流形近似和 投影 (UMAP)俊抵、louvain 簇的層次聚類(lèi)和差異表達(dá)分析,以識(shí)別細(xì)胞類(lèi)型標(biāo)記基因以促進(jìn)注釋坐梯。
具體而言徽诲,對(duì)于文庫(kù)大小標(biāo)準(zhǔn)化,給定細(xì)胞的基因計(jì)數(shù)除以該細(xì)胞的總計(jì)數(shù)吵血,乘以比例因子 (104) 并應(yīng)用自然對(duì)數(shù)轉(zhuǎn)換谎替。 整合樣本的特征是通過(guò)基于每個(gè)樣本中的方差-均值關(guān)系選擇高度可變基因 (HVG) 來(lái)獲得的(使用 SelectIntegrationFeatures 函數(shù)),并根據(jù)被識(shí)別為 HVG 的樣本數(shù)量對(duì)特征進(jìn)行排序蹋辅。 默認(rèn)情況下钱贯,選擇前 2000 個(gè) HVG 并將其送入下游 PCA 程序。 基于前 20 個(gè) PC 進(jìn)行和聲侦另,并獲得修正的和聲嵌入秩命。 然后將前 20 個(gè)和諧坐標(biāo)用于下游 louvain 聚類(lèi)和 UMAP 分析(使用 Seurat 中的 FindClusters 和 RunUMAP 函數(shù))。
# integration
set.seed(12345)
integration <- treecor_harmony(count = raw_data[['count']], sample_meta = raw_data[['sample_meta']], output_dir = getwd())
集成的 Seurat 對(duì)象將存儲(chǔ)在本地目錄中褒傅,過(guò)濾后的基因表達(dá)矩陣或細(xì)胞級(jí)meta數(shù)據(jù)可以通過(guò) access_data_seurat 提取弃锐,用于下游分析。
# list integration result and data extracted from Seurat object
integrated_data <- access_data_seurat(seurat_obj = integration,output_dir = getwd())
在示例中殿托,沒(méi)有細(xì)胞被過(guò)濾掉霹菊,我們將更新的細(xì)胞級(jí)meta數(shù)據(jù)(附加列包括“細(xì)胞類(lèi)型”和 UMAP 坐標(biāo))存儲(chǔ)為integrated_cellmeta。
data(integrated_cellmeta)
head(integrated_cellmeta)
## barcode sample celltype UMAP_1 UMAP_2
## 1 HD-17-Su:345974 HD-17-Su 3 -3.393676 -7.0051720
## 2 HD-17-Su:345975 HD-17-Su 2 -5.090366 6.9038575
## 3 HD-17-Su:345976 HD-17-Su 2 -6.407369 6.9599665
## 4 HD-17-Su:345977 HD-17-Su 5 -6.225506 3.5054925
## 5 HD-17-Su:345978 HD-17-Su 1 8.189426 0.8951219
## 6 HD-17-Su:345979 HD-17-Su 9 9.156926 -4.1967079
# data for downstream analysis
sample_meta <- raw_data[['sample_meta']]
count <- raw_data[['count']]
cell_meta <- integrated_cellmeta
Module 2: Define cell types at multiple resolutions
前面的步驟通過(guò) louvain 聚類(lèi)或基于用戶(hù)提供的細(xì)胞級(jí)meta數(shù)據(jù)(即細(xì)胞meta數(shù)據(jù)中的可選列 3)定義了細(xì)胞cluster支竹。 用戶(hù)可以添加文本標(biāo)簽旋廷,根據(jù)最高差異表達(dá)基因或已知細(xì)胞類(lèi)型標(biāo)記基因來(lái)注釋每個(gè)細(xì)胞cluster的細(xì)胞類(lèi)型。 在這里唾戚,我們?cè)?UMAP 上疊加了幾個(gè)基因標(biāo)記(例如 CD3D柳洋、CD19 和 CD68)以粗略地注釋細(xì)胞cluster。
# cell clusters
label_text <- integrated_cellmeta %>% group_by(celltype) %>% summarise(UMAP_1 = median(UMAP_1),UMAP_2 = median(UMAP_2))
ggplot(integrated_cellmeta,aes(x=UMAP_1,y=UMAP_2,color = celltype)) +
geom_point(size = 0.1) +
geom_text(data = label_text,aes(x=UMAP_1,y=UMAP_2,label = celltype),color = 'black',size = 5) +
theme_classic() +
guides(color = guide_legend(override.aes = list(size = 3)))
# gene markers
genes <- c('CD3D','CD14','CD19','NCAM1','CD4','CD8A',
'FCGR3A','CD1C','CD68','CD79A','CSF3R',
'CD33','CCR7','CD38','CD27','KLRD1')
rc <- Matrix::colSums(count)
sub_count <- count[genes,] %>% as.matrix
sub_norm <- log2(t(t(sub_count)/rc*1e6 + 1)) %>% as.matrix
df_marker <- t(sub_norm) %>% data.frame %>% mutate(barcode = rownames(.)) %>% gather(gene,expr,-barcode) %>% inner_join(cell_meta %>% select(barcode,UMAP_1,UMAP_2))
ggplot(df_marker,aes(x = UMAP_1,y = UMAP_2,col = expr)) +
geom_point(size = 0.01, shape = ".") +
scale_colour_viridis_c(option = 'C',direction = 1) +
facet_wrap(~ gene) +
theme_classic(base_size = 12) +
theme(legend.position = 'bottom')
We roughly categorize 16 cell clusters into 3 large cell-types based on gene markers: B cells (CD19), T cells (CD3D) and Monocytes (CD14).
# new celltype annotation
new_label <- data.frame(old = sort(factor(unique(cell_meta$celltype,levels = 1:16)))) %>%
mutate(large_celltype = ifelse(old %in% c(1,9,11,12,13,16),'Mono',
ifelse(old %in% c(2:6,8,10,14,15),'T','B')),
new = paste0(large_celltype,'_c',old)) %>% select(-large_celltype)
## old new
## 1 1 Mono_c1
## 2 10 T_c10
## 3 11 Mono_c11
## 4 12 Mono_c12
## 5 13 Mono_c13
## 6 14 T_c14
## 7 15 T_c15
## 8 16 Mono_c16
## 9 2 T_c2
## 10 3 T_c3
## 11 4 T_c4
## 12 5 T_c5
## 13 6 T_c6
## 14 7 B_c7
## 15 8 T_c8
## 16 9 Mono_c9
Modify cell type annotation
Users can modify the cell type label via modify_label function to update cell type annotations in cell_meta:
# modify cell type names in `cell_meta`
cell_meta <- modify_label(new_label,hierarchy_list = NULL,cell_meta)$cell_meta
## Modify label in cell_meta
head(cell_meta[,c('barcode','sample','celltype')])
## barcode sample celltype
## 1 HD-17-Su:345974 HD-17-Su T_c3
## 2 HD-17-Su:345975 HD-17-Su T_c2
## 3 HD-17-Su:345976 HD-17-Su T_c2
## 4 HD-17-Su:345977 HD-17-Su T_c5
## 5 HD-17-Su:345978 HD-17-Su Mono_c1
## 6 HD-17-Su:345979 HD-17-Su Mono_c9
同樣叹坦,也可以使用此功能更新層次樹(shù)結(jié)構(gòu)中的細(xì)胞類(lèi)型注釋(i.e. modify_label(new_label,hierarchy_list = hierarchy_list,cell_meta = NULL)$hierarchy_list).
Construct hierarchical tree structure
為了便于在多個(gè)分辨率下進(jìn)行關(guān)聯(lián)分析熊镣,細(xì)胞cluster被進(jìn)一步分層聚類(lèi)。 可以使用數(shù)據(jù)驅(qū)動(dòng)的方法或基于知識(shí)的方法來(lái)導(dǎo)出樹(shù)募书。
該樹(shù)可以由用戶(hù)基于先驗(yàn)知識(shí)(基于知識(shí))提供绪囱,也可以使用層次聚類(lèi)從數(shù)據(jù)中導(dǎo)出(數(shù)據(jù)驅(qū)動(dòng))。 對(duì)于數(shù)據(jù)驅(qū)動(dòng)方法莹捡,通過(guò)應(yīng)用 treecor_harmony 函數(shù)(或 Seurat R 包中的 BuildClusterTree)在 PC 空間上構(gòu)建系統(tǒng)發(fā)育樹(shù)鬼吵。
Data-driven approach
在數(shù)據(jù)驅(qū)動(dòng)的方法中,樹(shù)是通過(guò)應(yīng)用 treecor_harmony 函數(shù)(或 Seurat R 包中的 BuildClusterTree )通過(guò) louvain 集群的層次聚類(lèi)生成的篮赢。 具體來(lái)說(shuō)齿椅,首先對(duì)每個(gè)細(xì)胞cluster內(nèi)的細(xì)胞求平均 PC 分?jǐn)?shù)琉挖,然后構(gòu)建分層樹(shù)。 數(shù)據(jù)驅(qū)動(dòng)的方法可以提供一種無(wú)偏的方式來(lái)推斷底層樹(shù)結(jié)構(gòu)涣脚,但注釋每個(gè)中間樹(shù)節(jié)點(diǎn)可能具有挑戰(zhàn)性示辈。
Knowledge-based approach
在基于知識(shí)的方法中,可以通過(guò)提供一個(gè)字符串來(lái)描述不同粒度級(jí)別的集群的層次關(guān)系遣蚀,從而根據(jù)他們的先驗(yàn)知識(shí)來(lái)指定樹(shù)矾麻。 樹(shù)中的葉節(jié)點(diǎn)對(duì)應(yīng)于從集成步驟中的 louvain 聚類(lèi)或來(lái)自細(xì)胞meta數(shù)據(jù)的“細(xì)胞類(lèi)型”列中獲得的細(xì)胞cluster。
# specify string
input_string <- '@All(@B(B_c7),@T(T_c15,T_c14,T_c10,T_c8,T_c6,T_c5,T_c4,T_c3,T_c2),@Monocyte(Mono_c16,Mono_c13,Mono_c12,Mono_c11,Mono_c9,Mono_c1))'
# extract hierarchy from string
hierarchy_structure <- extract_hrchy_string(input_string,special_character = '@', plot = T)
For demonstration purpose, we will use knowledge-based hierarchical structure in subsequent analysis.
Module 3: Identify association between cell type proportion and sample phenotype
細(xì)胞類(lèi)型比例的分析被包裝到一個(gè)獨(dú)立的函數(shù) treecor_ctprop 中芭梯。 這里我們首先為每個(gè)樹(shù)節(jié)點(diǎn)定義特征险耀,然后評(píng)估特征與樣本表型之間的關(guān)聯(lián)。 對(duì)于葉節(jié)點(diǎn)細(xì)胞簇玖喘,特征定義為樣本中落入該節(jié)點(diǎn)的細(xì)胞比例甩牺。 對(duì)于中間父節(jié)點(diǎn)或根節(jié)點(diǎn),可以選擇在聚合(默認(rèn)設(shè)置)芒涡、連接葉節(jié)點(diǎn)或連接直接子節(jié)點(diǎn)之一中定義特征柴灯。 一旦定義了每個(gè)節(jié)點(diǎn)的特征卖漫,將遍歷每個(gè)樹(shù)節(jié)點(diǎn)以檢查其特征與樣本表型之間的相關(guān)性(或其他匯總統(tǒng)計(jì)量)费尽。 計(jì)算匯總統(tǒng)計(jì)量的方法取決于表型是單變量還是多變量:
Univariate phenotype/Multivariate phenotype analyzed separately:
Pearson correlation (default) with correlation sign
Spearman correlation with correlation sign
Canonical correlation
Chi-squared statistic (via likelihood ratio test between a full model (with phenotype as explanatory variables) and a reduced model (intercept-only)) with coefficient sign
Multivariate phenotype analyzed jointly:
Canonical correlation
Chi-squared statistic (via likelihood ratio test)
The permutation-based p-value can be obtained and adjusted p-value is computed using Benjamini & Yekutieli procedure.
We can first use a kernel density plot to visualize cell density stratified by disease severity.
# density plot on UMAP embeddings
treecor_celldensityplot(cell_meta,
sample_meta,
row_variable = 'study',
col_variable = 'severity',
row_combined = F)
Next, we can assess the association between cell type proportion and disease severity using treecor_ctprop() pipeline:
# cell type prop pipeline (default)
res_ctprop_full <- treecor_ctprop(hierarchy_structure,
cell_meta,
sample_meta,
response_variable = 'severity',
method = "aggregate",
analysis_type = "pearson",
num_permutations = 100)
names(res_ctprop_full)
第一個(gè)元素是一個(gè)計(jì)算匯總統(tǒng)計(jì)量(例如皮爾遜相關(guān)性)以及 p 值和調(diào)整后的 p 值的表格。 第二個(gè)元素是每個(gè)細(xì)胞類(lèi)型的 PC 矩陣列表羊始。 但是旱幼,在默認(rèn)設(shè)置(聚合)中,比例聚合為向量(一維)突委,因此在這種情況下不進(jìn)行 PCA柏卤。
# extract Pearson correlation
res_ctprop <- res_ctprop_full[[1]] %>% mutate(severity.absolute_cor = abs(severity.pearson))
head(res_ctprop)
## id severity.method severity.analysis_type severity.pearson severity.p
## 1 1 aggregate pearson NA NA
## 2 2 aggregate pearson 0.1147058 0.82828283
## 3 3 aggregate pearson -0.7614817 0.08080808
## 4 4 aggregate pearson 0.7603030 0.08080808
## 5 5 aggregate pearson 0.1147058 0.82828283
## 6 6 aggregate pearson 0.5039696 0.11111111
## severity.adjp severity.direction severity.p.sign severity.adjp.sign x y
## 1 NA <NA> <NA> <NA> 6.25 2
## 2 1.0000000 + ns ns 0.00 1
## 3 0.7781478 - ns ns 5.00 1
## 4 0.7781478 + ns ns 12.50 1
## 5 1.0000000 + ns ns 0.00 0
## 6 0.8321858 + ns ns 1.00 0
## label leaf severity.absolute_cor
## 1 All FALSE NA
## 2 B FALSE 0.1147058
## 3 T FALSE 0.7614817
## 4 Monocyte FALSE 0.7603030
## 5 B_c7 TRUE 0.1147058
## 6 T_c15 TRUE 0.5039696
colnames(res_ctprop)
## [1] "id" "severity.method" "severity.analysis_type"
## [4] "severity.pearson" "severity.p" "severity.adjp"
## [7] "severity.direction" "severity.p.sign" "severity.adjp.sign"
## [10] "x" "y" "label"
## [13] "leaf" "severity.absolute_cor"
然后可以使用 TreeCorTreat 圖來(lái)可視化結(jié)果。 可以為不同的美學(xué)指定變量(例如顏色匀油、大小缘缚、alpha)。 后面將詳細(xì)討論 TreeCorTreat 圖敌蚜。
# visualize
treecortreatplot(hierarchy_structure,
annotated_df = res_ctprop,
response_variable = 'severity',
color_variable = 'direction',
size_variable = 'absolute_cor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2)
Module 4: Identify association between global gene expression and sample phenotype
全局基因表達(dá)的分析被包裝到一個(gè)獨(dú)立的函數(shù) treecor_expr 中桥滨。在這里,還首先為每個(gè)樹(shù)節(jié)點(diǎn)定義特征弛车,然后評(píng)估全局基因表達(dá)-樣本表型關(guān)聯(lián)齐媒。對(duì)于葉子節(jié)點(diǎn),首先將節(jié)點(diǎn)中的所有細(xì)胞池化以獲得節(jié)點(diǎn)的偽批量配置文件纷跛。然后喻括,使用 R 中的局部加權(quán)散點(diǎn)圖平滑 (LOWESS) 程序,使用來(lái)自所有樣本的這些偽批量配置文件來(lái)選擇高度可變的基因贫奠。每個(gè)樣本中所選高變量基因的偽批量配置文件用作節(jié)點(diǎn)的特征向量唬血。這個(gè)特征向量是多元的望蜡。對(duì)于非葉節(jié)點(diǎn),可以選擇在聚合(默認(rèn)設(shè)置)、連接葉節(jié)點(diǎn)或連接直接子節(jié)點(diǎn)之一中定義特征。一旦定義了每個(gè)節(jié)點(diǎn)的特征韧骗,將遍歷每個(gè)樹(shù)節(jié)點(diǎn)以檢查其特征與樣本表型之間的相關(guān)性(或其他匯總統(tǒng)計(jì)量)扎附。可以選擇規(guī)范相關(guān)或 F 統(tǒng)計(jì)量(Pillai-Bartlett枢舶,將完整模型(以表型作為解釋變量)與簡(jiǎn)化模型(僅截取)進(jìn)行比較)。
The following code illustrates an example of association between gene expression and disease severity, using default setting (aggregate).
# gene expression pipeline
res_expr_full <- treecor_expr(count,
hierarchy_structure,
cell_meta,
sample_meta,
response_variable = 'severity',
method = 'aggregate',
analysis_type = 'cancor',
num_permutations = 100)
By running treecor_expr, it results in two elements: the first element is a summary table of computed summary statistic (e.g. canonical correlation) along with p-value and adjusted p-value for each cell type; and the second element is a list of PC matrices for each cell type.
# extract canonical correlation
res_expr <- res_expr_full[[1]]
head(res_expr)
## id severity.cancor severity.p severity.adjp severity.direction
## 1 1 0.8942939 0.03030303 0.4360897 +
## 2 2 0.6428883 0.15151515 0.9911129 +
## 3 3 0.9297493 0.03030303 0.4360897 +
## 4 4 0.8324347 0.03030303 0.4360897 +
## 5 5 0.6428883 0.15151515 0.9911129 +
## 6 6 0.8807827 0.04040404 0.4845441 +
## severity.p.sign severity.adjp.sign x y label leaf
## 1 sig ns 6.25 2 All FALSE
## 2 ns ns 0.00 1 B FALSE
## 3 sig ns 5.00 1 T FALSE
## 4 sig ns 12.50 1 Monocyte FALSE
## 5 ns ns 0.00 0 B_c7 TRUE
## 6 sig ns 1.00 0 T_c15 TRUE
The summary table is used to generate TreeCorTreat plot:
# visualize
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2)
除了匯總表膏孟,還可以通過(guò)使用來(lái)自基因表達(dá)分析的 PC 矩陣分別評(píng)估每種細(xì)胞類(lèi)型。
# extract PCA list
pc_expr <- res_expr_full[[2]]
names(pc_expr)
## [1] "All" "B" "T" "Monocyte" "B_c7" "T_c15"
## [7] "T_c14" "T_c10" "T_c8" "T_c6" "T_c5" "T_c4"
## [13] "T_c3" "T_c2" "Mono_c16" "Mono_c13" "Mono_c12" "Mono_c11"
## [19] "Mono_c9" "Mono_c1"
# extract PCA matrix corresponding to 'T' tree node
pc_expr[['T']]
## PC1 PC2
## HD-17-Su -44.684639 2.823131
## HD-19-Su -51.774059 4.264000
## HD-23-Su -51.352355 -23.472410
## HD-30-Su -39.098933 -37.230307
## Se-137.1-Su 73.681369 47.520847
## Se-178.1-Su 11.966558 20.281313
## Se-180.1-Su 6.457736 53.349695
## Se-181.1-Su 94.804324 -67.536270
PC 矩陣列表由細(xì)胞級(jí)meta數(shù)據(jù)的細(xì)胞類(lèi)型列中指定的細(xì)胞類(lèi)型名稱(chēng)命名拌汇。 因此柒桑,可以提取給定細(xì)胞類(lèi)型(例如 T 細(xì)胞類(lèi)別)的每個(gè)樣本的 PC 坐標(biāo),并使用 treecor_samplepcaplot 可視化樣本級(jí) PCA 圖噪舀。
# PCA plot (using `T` node)
treecor_samplepcaplot(sample_meta,
pca_matrix = pc_expr[['T']],
response_variable = 'severity',
font_size = 10,
point_size = 3)
可以觀察到樣本的嚴(yán)重性沿著虛線指示的方向從健康到嚴(yán)重變化魁淳,這對(duì)應(yīng)于從 CCA 推斷出的最佳軸,該軸最大化了嵌入空間中嚴(yán)重性和樣本坐標(biāo)之間的相關(guān)性与倡。
Module 5: Identify differentially expressed genes
差異表達(dá)基因 (DEG) 的分析包含在函數(shù) treecor_deg() 中界逛。 在這里,我們首先像以前一樣使用“聚合”方法計(jì)算每個(gè)樹(shù)節(jié)點(diǎn)的假體基因表達(dá)譜纺座。 換句話說(shuō)息拜,對(duì)于每個(gè)節(jié)點(diǎn),來(lái)自其descendants的所有細(xì)胞都被匯集并聚合到每個(gè)樣本中的偽批量配置文件中净响。 Pseudobulk 配置文件按庫(kù)大小標(biāo)準(zhǔn)化少欺。 然后使用 Limma 進(jìn)行差異表達(dá)分析:
- 單獨(dú)分析單變量表型/多變量表型:通過(guò)使用協(xié)變量調(diào)整對(duì)表型的基因表達(dá)進(jìn)行回歸來(lái)擬合 limma 模型。 將為每個(gè)節(jié)點(diǎn)報(bào)告通過(guò)用戶(hù)指定的錯(cuò)誤發(fā)現(xiàn)率 (FDR) 截?cái)嘀担J(rèn)截?cái)嘀?= 0.05)的 DEG馋贤,并將其與日志折疊更改赞别、t 統(tǒng)計(jì)、p 值和 FDR 一起保存到 csv 文件中配乓。
- 聯(lián)合分析多變量表型:使用數(shù)據(jù)驅(qū)動(dòng)方法 (PC1) 或用戶(hù)指定的權(quán)重(使用權(quán)重的多個(gè)性狀的線性組合)將多個(gè)性狀組合成單變量變量仿滔。 然后將聚合的性狀作為單變量表型進(jìn)行分析。
# DEG pipeline
res_deg <- treecor_deg(count,
hierarchy_structure,
cell_meta,
sample_meta,
response_variable = 'severity')
names(res_deg)
## [1] "dge.summary" "dge.ls" "pseudobulk.ls"
# 1st: summary of number of DEGs
head(res_deg$dge.summary) # or head(res_deg[[1]])
## label severity.num_deg x y id leaf
## 1 All 0 6.25 2 1 FALSE
## 2 B 0 0.00 1 2 FALSE
## 3 T 993 5.00 1 3 FALSE
## 4 Monocyte 2 12.50 1 4 FALSE
## 5 B_c7 0 0.00 0 5 TRUE
## 6 T_c15 456 1.00 0 6 TRUE
# 2nd: extract DEGs for a specific tree node using cell type name
# (1) check phenotypes
names(res_deg$dge.ls)
## [1] "severity"
# (2) choose a phenotype
severity.dge.ls <- res_deg$dge.ls[['severity']]
# (3) extract DEGs of T cell category
head(severity.dge.ls[['T']])
## severitySe.logFC severitySe.t severitySe.p severitySe.fdr
## FADS1 2.221299 13.884431 2.351480e-07 0.004165177
## JCHAIN 3.036823 12.451043 5.964215e-07 0.005282207
## INPPL1 1.755148 10.250902 3.058248e-06 0.017470015
## APOBEC3H 2.250345 9.908317 4.052855e-06 0.017470015
## GPR68 1.693644 9.515158 5.657245e-06 0.017470015
## CORO1C 1.813891 9.220954 7.316652e-06 0.017470015
# 3rd: extract sample-level pseudobulk for a specific tree node using cell type name
names(res_deg$pseudobulk.ls) # or names(res_deg[[3]])
## [1] "All" "B" "T" "Monocyte" "B_c7" "T_c15"
## [7] "T_c14" "T_c10" "T_c8" "T_c6" "T_c5" "T_c4"
## [13] "T_c3" "T_c2" "Mono_c16" "Mono_c13" "Mono_c12" "Mono_c11"
## [19] "Mono_c9" "Mono_c1"
head(res_deg$pseudobulk.ls[['T']])
## HD-17-Su HD-19-Su HD-23-Su HD-30-Su Se-137.1-Su Se-178.1-Su
## A1BG 5.493462 5.506674 5.6802740 5.876471 4.7791551 5.3461643
## A1BG-AS1 2.652148 1.854222 2.3063366 2.420313 2.9822341 2.2278498
## A2M 0.000000 0.000000 0.1968054 0.000000 0.6555113 0.3599841
## A2M-AS1 1.814409 1.413839 1.2995822 1.852688 3.5050187 2.9638223
## A4GALT 0.000000 0.000000 0.3699580 0.000000 0.0000000 0.0000000
## AAAS 4.653329 4.235223 4.1063058 4.665398 5.1026623 4.7283047
## Se-180.1-Su Se-181.1-Su
## A1BG 5.516765 4.809994
## A1BG-AS1 0.000000 2.002053
## A2M 0.000000 1.001369
## A2M-AS1 3.578594 3.809896
## A4GALT 0.000000 0.000000
## AAAS 4.315328 4.250520
treecor_deg 返回一個(gè)包含三個(gè)元素的列表:第一個(gè)元素總結(jié)了每種細(xì)胞類(lèi)型的 DEG 數(shù)量扰付,第二個(gè)元素對(duì)應(yīng)于每個(gè)表型(response_variable)的每個(gè)細(xì)胞cluster中識(shí)別的特定 DEG堤撵,第三個(gè)元素存儲(chǔ)樣本級(jí)假體基因表達(dá)矩陣( 對(duì)于每種細(xì)胞類(lèi)型,每一行是一個(gè)基因羽莺,每一列是一個(gè)樣本)实昨。
根據(jù)是單獨(dú)分析還是聯(lián)合分析多變量表型,匯總表中會(huì)有額外的列(即第一個(gè)元素 - dge.summary)盐固,記錄關(guān)于不同表型的 DEG 數(shù)量和 DEG 列表中的其他元素(即第二個(gè)元素 - dge.ls)荒给。
We can visualize the number of DEGs using TreeCorTreat plot:
# visualize
treecortreatplot(hierarchy_structure,
annotated_df = res_deg$dge.summary,
response_variable = 'severity',
color_variable = 'num_deg',
size_variable = 'num_deg',
alpha_variable = NULL,
annotate_number = T,
annotate_number_column = 'num_deg',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2)
此外丈挟,可以使用 treecor_degheatmap 通過(guò)指定 n 和對(duì)數(shù)折疊變化的方向/符號(hào)來(lái)探索前 n 個(gè) DEG 的基因表達(dá)模式:
# heatmap for top 10 DEGs with positive log fold change (enriched in Severe)
treecor_degheatmap(sample_meta,
pseudobulk = res_deg$pseudobulk.ls[['T']],
deg_result = res_deg$dge.ls$severity[['T']],
top_n = 10,
deg_logFC = "positive",
annotation_col = c('severity','sex','age'),
annotation_colors = list(severity = c('HD' = 'green','Se' = 'red'),
sex = c('F' = 'purple','M' = 'blue')))
# heatmap for top 10 DEGs with negative log fold change (enriched in Healthy)
treecor_degheatmap(sample_meta,
pseudobulk = res_deg$pseudobulk.ls[['T']],
deg_result = res_deg$dge.ls$severity[['T']],
top_n = 10,
deg_logFC = "negative",
annotation_col = c('severity','sex','age'),
annotation_colors = list(severity = c('HD' = 'green','Se' = 'red'),
sex = c('F' = 'purple','M' = 'blue')))
# Combined two plots above
treecor_degheatmap(sample_meta,
pseudobulk = res_deg$pseudobulk.ls[['T']],
deg_result = res_deg$dge.ls$severity[['T']],
top_n = 10,
deg_logFC = "both",
annotation_col = c('severity','sex','age'),
annotation_colors = list(severity = c('HD' = 'green','Se' = 'red'),
sex = c('F' = 'purple','M' = 'blue')))
Module 6: Visualization
TreeCorTreat 提供了多種函數(shù)來(lái)可視化中間和最終結(jié)果。 例如志电,可以使用 treecor_celldensityplot 來(lái)顯示不同樣本組的細(xì)胞類(lèi)型比例(模塊 3)曙咽。 treecor_samplepcaplot 將樣本投影到共享的低維嵌入并找到最佳軸(模塊 4)。treecor_degheatmap 顯示給定細(xì)胞類(lèi)型的最高差異表達(dá)基因(模塊 5)挑辆。 特別是例朱,為了總結(jié)最終結(jié)果,開(kāi)發(fā)了一個(gè)多功能函數(shù) treecortreatplot 以我們稱(chēng)為“TreeCorTreat plot”的格式顯示多分辨率細(xì)胞類(lèi)型-表型關(guān)聯(lián)鱼蝉。 該圖由顯示細(xì)胞類(lèi)型層次結(jié)構(gòu)的樹(shù)狀圖和顯示每個(gè)樹(shù)節(jié)點(diǎn)分析結(jié)果的信息層組成洒嗤。
Tree skeleton representation
為了顯示樹(shù),可以使用直線魁亦、經(jīng)典的角度彎曲表示或二次貝塞爾曲線表示來(lái)連接節(jié)點(diǎn)渔隶。 可以根據(jù)自己的喜好選擇線條美感,例如線型(例如實(shí)線或虛線)和線條顏色洁奈。
Edge type
- Straight line
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
## parameter: edge_path_type
edge_path_type = "link")
- Classical angle bend
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.6,
nonleaf_point_gap = 0.3,
## parameter: edge_path_type
edge_path_type = "elbow")
- Quadratic Bezier curves
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
## parameter: edge_path_type
edge_path_type = "diagonal")
Line aesthetics: line type and color
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
## parameter: line_type
line_type = 'dashed',
## parameter: line_color
line_color = 'darkgreen')
Modify label or circle position
In some cases where label length (cell type name) is long or circle size is quite large, default label or circle position on the non-leaf part (left) may not work well. Therefore, we introduce nonleaf_label_pos and nonleaf_point_gap parameters which allow users to manually fix the label position or gap between multivariate phenotype. A tip is that if one have k phenotypes to be analyzed separately, one can first choose a proper gap between points (i.e. nonleaf_point_gap = 0.1) and let nonleaf_label_pos = (i.e. let k=3, nonleaf_label_pos = 0.1*3+0.2 = 0.5).In the above examples, we specify gap between points at 0.2 and label position at 0.4.
Leaf representation
For external node configuration, TreeCorTreat plot aligns cell types in the rows and phenotypes in the columns. There are three different ways to visualize the results: balloon plot, heatmap and bar plot. Users can use color (color_variable), size (size_variable) and transparency (alpha_variable) to encode different information.
Balloon plot
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
## parameter: plot_type
plot_type = 'balloon')
In the above balloon plot, the size of the balloon reflects the magnitude of canonical correlation between global gene expression profile and disease severity and color as well as transparency correspond to p-value significance.
Alternatively, we can use color to represent canonical correlation and size/transparency to represent p-value significance using the following code:
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'cancor',
size_variable = 'p.sign',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
## parameter: plot_type
plot_type = 'balloon')
Heatmap
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'cancor',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
## parameter: plot_type
plot_type = 'heatmap')
Barplot
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'cancor',
size_variable = 'p.sign',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
## parameter: plot_type
plot_type = 'bar')
More advanced plotting figures
Annotate numbers
Users can specify which variable/color to be used in annotating TreeCorTreat plot via annotate_number, annotate_number_column and annotate_number_color arguments.The following code demonstrates an example of overlaying canonical correlation on top of TreeCorTreat:
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
plot_type = 'balloon',
annotate_number = T,
annotate_number_column = 'cancor',
annotate_number_color = 'black')
Modify legend title
If we want to modify legend title, one way is to modify column names of annotated_df directly before running treecortreatplot. The column names for summary statistic and statistical significance are usually defined in response_variable.colname format (e.g. severity.cancor, severity.direction (only exists in cell type proportion analysis), severity.p, severity.p.sign, etc). Thus, it’s crucial to check the column names are in the correct format before generating TreeCorTreat plot.
Suppose we hope to modify names for both response variables (‘severity’) and legends (i.e. ‘cancor’ and ‘p.sign’) by capitalizing the first letter in ‘severity’ and ‘cancor’ and replace ‘p.sign’ by ‘P-value significance’.
## modify column names
colnames(res_expr)
## [1] "id" "severity.cancor" "severity.p"
## [4] "severity.adjp" "severity.direction" "severity.p.sign"
## [7] "severity.adjp.sign" "x" "y"
## [10] "label" "leaf"
res_expr_new <- res_expr
colnames(res_expr_new) <- gsub('severity','Severity',colnames(res_expr_new))
colnames(res_expr_new) <- gsub('cancor','Cancor',colnames(res_expr_new))
colnames(res_expr_new) <- gsub('\\.p.sign','\\.P-value significance',colnames(res_expr_new))
## check modified column names
colnames(res_expr_new)
## [1] "id" "Severity.Cancor"
## [3] "Severity.p" "Severity.adjp"
## [5] "Severity.direction" "Severity.P-value significance"
## [7] "Severity.adjp.sign" "x"
## [9] "y" "label"
## [11] "leaf"
## visualize
treecortreatplot(hierarchy_structure,
annotated_df = res_expr_new,
response_variable = 'Severity',
color_variable = 'P-value significance',
size_variable = 'Cancor',
alpha_variable = 'P-value significance',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
plot_type = 'balloon')
Modify label colors
Users can also modify label colors to highlight cell types of interest. The advanced setting can be configured by advanced_list argument by providing label_info with fixed column names: ‘label’ (cell type label) and ‘label.color’ (a color vector). For example, we can highlight cell types that have significant global gene expression-disease severity correlation in red:
label_info <- data.frame(label = res_expr$label,
label.color = ifelse(res_expr$severity.p.sign == 'ns','black','red'))
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'cancor',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
plot_type = 'balloon',
advanced_list = list(label_info = label_info))
Create your own mapping from levels to aesthetic values
Users can customize mapping from levels in the data (e.g. categorical variables) to certain aesthetic values (e.g. color/size/transparency) via breaks and values (alike scale_manual() in ggplot2). This can be also achieved by providing color_info or size_info or alpha_info in advanced_list argument. The following code illustrates an example of changing color for statistical significance as blue (non-significant) and red (significant) and modifying transparency:
advanced_list <- list(color_info = data.frame(breaks = c('ns','sig'), values = c('blue','red')),
alpha_info = data.frame(breaks = c('ns','sig'), values = c(0.5,1)))
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'cancor',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
plot_type = 'balloon',
advanced_list = advanced_list)
Choose a different color palette
For categorical variable, color palette can be specified by color_info in advanced options; For continuous variable, we provide different palettes from RColorBrewer package (default is ‘Spectral’) and can be modified via palette as a element in the advanced_list.
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'cancor',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
plot_type = 'balloon',
advanced_list = list(palette = 'PiYG'))
Configure layout
The layout of TreeCorTreat plot can be modified via layout_widths in the advanced_list parameter, which specifies the relative left-to-right ratio (or non-leaf: leaf ratio).
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'cancor',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
plot_type = 'balloon',
advanced_list = list(layout_widths = c(2,1)))
Save TreeCorTreat plots
TreecorTreat plots can be saved into pdf, png or other graphical formats.
png('TreeCorTreat_plot.png', height = 8, width = 15, res = 300, units = 'in')
treecortreatplot(hierarchy_structure,
annotated_df = res_expr,
response_variable = 'severity',
color_variable = 'cancor',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.4,
nonleaf_point_gap = 0.2,
plot_type = 'balloon')
dev.off()
TreeCorTreat supports versatile multi-resolution analysis
為了在多個(gè)分辨率下分析細(xì)胞類(lèi)型-表型關(guān)聯(lián)间唉,細(xì)粒度細(xì)胞cluster(children)逐漸合并為更大的細(xì)胞cluster(parents)。 當(dāng)通過(guò)合并兩個(gè)或多個(gè)子節(jié)點(diǎn)創(chuàng)建父節(jié)點(diǎn)時(shí)利术,來(lái)自子節(jié)點(diǎn)的信息可以以多種不同方式組合呈野,具有不同的含義。 組合子節(jié)點(diǎn)的兩種基本方法是“聚合”和“串聯(lián)”氯哮。 “聚合”方法將來(lái)自子節(jié)點(diǎn)的所有細(xì)胞匯集到一個(gè)用于表示父節(jié)點(diǎn)的偽批量樣本中际跪。 在“串聯(lián)”方法中商佛,首先將每個(gè)子節(jié)點(diǎn)內(nèi)的細(xì)胞池化喉钢,以形成該節(jié)點(diǎn)的偽批量樣本。 一個(gè)使用從其偽批量樣本(例如良姆,高度可變的基因)中提取的特征或特征向量來(lái)表示每個(gè)子節(jié)點(diǎn)肠虽。 然后通過(guò)將來(lái)自所有子節(jié)點(diǎn)的特征向量連接成一個(gè)更長(zhǎng)的向量來(lái)表示父節(jié)點(diǎn)。
To support different needs of users, TreeCorTreat provides three different ways to combine child nodes: ‘a(chǎn)ggregate’ (default), ‘concatenate leaf nodes’, and ‘concatenate immediate children’.
Aggregate
The default setting for both treecor_expr and treecor_ctprop is aggregation (see previous examples), where we aggregate raw read counts for gene expression or number of cells for cell composition for non-leaf nodes.
Concatenate leaf nodes
In the ‘concatenate leaf nodes’ approach, feature vectors of all terminal leaf nodes derived from a node are concatenated into a long vector to serve as the feature vector of the node. For global gene expression analysis, this will result in a concatenated vector consisting of pseudobulk expression of highly variable genes obtained from each leaf node. For cell type proportion analysis, this will result in a vector of cell type proportions of the leaf nodes.
# gene expression pipeline; concat_leaf
res_expr_concatLeaf <- treecor_expr(count,
hierarchy_structure,
cell_meta,
sample_meta,
response_variable = 'severity',
method = 'concat_leaf',
num_permutations = 100)$canonical_corr
head(res_expr_concatLeaf)
## id severity.cancor severity.p severity.adjp severity.direction
## 1 1 0.7803956 0.03030303 0.4360897 +
## 2 2 0.6428883 0.15151515 0.9911129 +
## 3 3 0.8556124 0.03030303 0.4360897 +
## 4 4 0.6743079 0.03030303 0.4360897 +
## 5 5 0.6428883 0.15151515 0.9911129 +
## 6 6 0.8807827 0.04040404 0.4845441 +
## severity.p.sign severity.adjp.sign x y label leaf
## 1 sig ns 6.25 2 All FALSE
## 2 ns ns 0.00 1 B FALSE
## 3 sig ns 5.00 1 T FALSE
## 4 sig ns 12.50 1 Monocyte FALSE
## 5 ns ns 0.00 0 B_c7 TRUE
## 6 sig ns 1.00 0 T_c15 TRUE
Concatenate immediate children
In the ‘concatenate immediate children’ approach, a node’s immediate children are first identified. The feature vector of each immediate child is obtained using the ‘a(chǎn)ggregate’ approach. Then feature vector of target node is obtained by concatenating the feature vectors of its immediate children.
gene expression pipeline; concat_immediate_children
res_expr_concatImmChild <- treecor_expr(count,
hierarchy_structure,
cell_meta,
sample_meta,
response_variable = 'severity',
method = 'concat_immediate_children',
num_permutations = 100)$canonical_corr
head(res_expr_concatImmChild)
## id severity.cancor severity.p severity.adjp severity.direction
## 1 1 0.8504976 0.03030303 0.4360897 +
## 2 2 0.6428883 0.15151515 0.9911129 +
## 3 3 0.8556124 0.03030303 0.4360897 +
## 4 4 0.6743079 0.03030303 0.4360897 +
## 5 5 0.6428883 0.15151515 0.9911129 +
## 6 6 0.8807827 0.04040404 0.4845441 +
## severity.p.sign severity.adjp.sign x y label leaf
## 1 sig ns 6.25 2 All FALSE
## 2 ns ns 0.00 1 B FALSE
## 3 sig ns 5.00 1 T FALSE
## 4 sig ns 12.50 1 Monocyte FALSE
## 5 ns ns 0.00 0 B_c7 TRUE
## 6 sig ns 1.00 0 T_c15 TRUE
# compare 3 methods (non-leaf nodes)
summary_ls <- list(res_expr %>% filter(!leaf) %>% select(id,label,severity.cancor) %>% rename(aggregate = severity.cancor),
res_expr_concatLeaf %>% filter(!leaf) %>% select(id,label,severity.cancor) %>% rename(concatLeaf = severity.cancor),
res_expr_concatImmChild %>% filter(!leaf) %>% select(id,label,severity.cancor) %>% rename(concatImmChild = severity.cancor))
summary <- Reduce(inner_join,summary_ls)
summary
## id label aggregate concatLeaf concatImmChild
## 1 1 All 0.8942939 0.7803956 0.8504976
## 2 2 B 0.6428883 0.6428883 0.6428883
## 3 3 T 0.9297493 0.8556124 0.8556124
## 4 4 Monocyte 0.8324347 0.6743079 0.6743079
For some broad cell categories like B, T and Monocytes, the canonical correlation of leaf concatenation and immediate children concatenation are the same because the immediate children set is same as its corresponding leaf nodes. Depending on different tree hierarchies, three approaches may have different results and implications. This can also be visualized using TreeCorTreat plot:
# aggregate
colnames(res_expr) <- gsub('severity','Severity\n(Agg)',colnames(res_expr))
# concat_leaf
colnames(res_expr_concatLeaf) <- gsub('severity','Severity\n(Leaf)',colnames(res_expr_concatLeaf))
# concat_immediate_children
colnames(res_expr_concatImmChild) <- gsub('severity','Severity\n(ImmChild)',colnames(res_expr_concatImmChild))
# combine three approaches
res_combined <- Reduce(inner_join,list(res_expr,res_expr_concatLeaf,res_expr_concatImmChild))
# plot
treecortreatplot(hierarchy_structure,
annotated_df = res_combined,
response_variable = c('Severity\n(Agg)','Severity\n(ImmChild)','Severity\n(Leaf)'),
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.8,
nonleaf_point_gap = 0.2,
plot_type = 'balloon')
Analysis of multivariate outcomes
當(dāng)有多個(gè)表型性狀時(shí)玛追,可以將每個(gè)性狀作為單變量表型單獨(dú)分析税课,也可以作為多變量表型聯(lián)合分析。 為了分析多變量表型與細(xì)胞類(lèi)型比例或全局基因表達(dá)之間的關(guān)聯(lián)痊剖,CCA 用于計(jì)算表型與細(xì)胞類(lèi)型特征(即細(xì)胞類(lèi)型比例或基因表達(dá))之間的典型相關(guān)性韩玩。
Separate evaluation
# individually
multi_expr_sep <- treecor_expr(count,
hierarchy_structure,
cell_meta,
sample_meta,
response_variable = c('severity','age'),
num_permutations = 100)$canonical_corr
# visualize
treecortreatplot(hierarchy_structure,
annotated_df = multi_expr_sep,
response_variable = c('severity','age'),
separate = T,
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.5,
nonleaf_point_gap = 0.2)
Joint evaluation
# jointly
multi_expr_joint <- treecor_expr(count,
hierarchy_structure,
cell_meta,
sample_meta,
response_variable = c('severity','age'),
separate = F,
num_permutations = 100)$canonical_corr
# visualize
treecortreatplot(hierarchy_structure,
annotated_df = multi_expr_joint,
response_variable = c('severity','age'),
separate = F,
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.2,
nonleaf_point_gap = 0.1)
為了分析差異基因表達(dá),首先使用主成分分析或使用指定權(quán)重的性狀線性組合將多變量表型轉(zhuǎn)換為單變量表型陆馁。 轉(zhuǎn)化后的單變量表型結(jié)合了來(lái)自多個(gè)性狀的信息找颓,然后用于運(yùn)行差異表達(dá)分析。
# jointly
multi_deg_joint <- treecor_deg(count,
hierarchy_structure,
cell_meta,
sample_meta %>% mutate(severity = ifelse(severity == "HD", 0,1)),
response_variable = c('severity','age'),
separate = F,
save_as_csv = F)$dge.summary
head(multi_deg_joint)
## label combined_phenotype.num_deg x y id leaf
## 1 All 1 6.25 2 1 FALSE
## 2 B 0 0.00 1 2 FALSE
## 3 T 700 5.00 1 3 FALSE
## 4 Monocyte 0 12.50 1 4 FALSE
## 5 B_c7 0 0.00 0 5 TRUE
## 6 T_c15 0 1.00 0 6 TRUE
Adjusting for covariates in TreeCorTreat analysis
TreeCorTreat is capable of handling covariates in the analysis, allowing users to adjust for potential confounders or unwanted technical variation. For example, instead of viewing age as a phenotype, one can also view it as a covariate and ask which cell type features are associated with disease severity after accounting for age.
# adjust for age
expr_adjusted <- treecor_expr(count,
hierarchy_structure,
cell_meta,
sample_meta,
response_variable = 'severity',
formula = '~age',
num_permutations = 100)$canonical_corr
# visualize
treecortreatplot(hierarchy_structure,
annotated_df = expr_adjusted,
response_variable = 'severity',
color_variable = 'p.sign',
size_variable = 'cancor',
alpha_variable = 'p.sign',
font_size = 12,
nonleaf_label_pos = 0.2,
nonleaf_point_gap = 0.1)
可視化相當(dāng)不錯(cuò)叮贩,值得大家借鑒
生活很好击狮,有你更好