最近手里有個(gè)非靶向代謝組的數(shù)據(jù)苔巨,通過學(xué)習(xí)MetaboDiff包來熟悉代謝組分析的思路和流程,接下來的流程來自于MetaboDiff包官方幫助文檔。
1. MetaboDiff包安裝
library("devtools")
install_github("andreasmock/MetaboDiff")
library(MetaboDiff)
2. 數(shù)據(jù)處理
2.1數(shù)據(jù)的導(dǎo)入
MetaboDiff包需要三個(gè)數(shù)據(jù):
- assay - 包含代謝物的相對豐度的數(shù)據(jù)矩陣;
- rowData -包含代謝物注釋信息的數(shù)據(jù) 框涣雕;
- colData - 包含樣本元數(shù)據(jù)的數(shù)據(jù)框。
MetaboDiff包自帶的示例數(shù)據(jù)來自于這篇文獻(xiàn)AKT1 and MYC Induce Distinctive Metabolic Fingerprints in Human Prostate Cancer闭翩。代謝組數(shù)據(jù)來自于61個(gè)前列腺癌病人和25個(gè)正常人的前列腺組織挣郭。
先查看一下這個(gè)三個(gè)數(shù)據(jù)。
> assay[1:5,1:5]
pat1 pat2 pat3 pat4 pat5
met1 33964.73 117318.43 118856.90 78670.7 102565.94
met2 18505.56 167585.32 59621.97 66220.4 74892.27
met3 NA 42373.93 27141.21 NA 38390.78
met4 61638.77 74595.78 NA NA NA
met5 NA 148363.61 43861.79 105835.2 25589.08
> head(colData)
id tumor_normal random_gender group
pat1 cp2 N female Control
pat2 cp7 N female Control
pat3 cp19 N male Control
pat4 cp26 N male Control
pat5 cp29 N female Control
pat6 cp32 N male Control
> head(rowData)
BIOCHEMICAL SUPER_PATHWAY SUB_PATHWAY METABOLON_ID
met1 1-arachidonoylglycerophosphoethanolamine* Lipid Lysolipid 35186
met2 1-arachidonoylglycerophosphoinositol* Lipid Lysolipid 34214
met3 1-arachidonylglycerol Lipid Monoacylglycerol 34397
met4 1-eicosadienoylglycerophosphocholine* Lipid Lysolipid 33871
met5 1-heptadecanoylglycerophosphoethanolamine* No Super Pathway No Pathway 37419
met6 1-linoleoylglycerol (1-monolinolein) Lipid Monoacylglycerol 27447
PLATFORM KEGG_ID HMDB_ID
met1 LC/MS neg <NA> HMDB11517
met2 LC/MS neg <NA> <NA>
met3 LC/MS neg C13857 HMDB11572
met4 LC/MS pos <NA> <NA>
met5 LC/MS neg <NA> <NA>
met6 LC/MS neg <NA> <NA>
#將三個(gè)數(shù)據(jù)集融合成一個(gè)以便于下游分析疗韵。
> (met <- create_mae(assay,rowData,colData))
A MultiAssayExperiment object of 1 listed
experiment with a user-defined name and respective class.
Containing an ExperimentList class object of length 1:
[1] raw: SummarizedExperiment with 307 rows and 86 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
2.2 代謝物的注釋
如果HMDB兑障、KEGG或ChEBI id是rowData數(shù)據(jù)集的一部分,則可以從小分子通路數(shù)據(jù)庫(SMPDB)檢索進(jìn)行代謝產(chǎn)物注釋蕉汪。
> met <- get_SMPDBanno(met,
+ column_kegg_id=6,
+ column_hmdb_id=7,
+ column_chebi_id=NA)
2.3 處理缺失值
> na_heatmap(met,
+ group_factor="tumor_normal",
+ label_colors=c("darkseagreen","dodgerblue"))
#剔除缺失值流译,計(jì)算代謝物的相對豐度。
> (met = knn_impute(met,cutoff=0.4))
A MultiAssayExperiment object of 2 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 2:
[1] raw: SummarizedExperiment with 307 rows and 86 columns
[2] imputed: SummarizedExperiment with 238 rows and 86 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
2.4 異常值熱圖
在標(biāo)準(zhǔn)化數(shù)據(jù)之前者疤,我們需要剔除數(shù)據(jù)中的異常值福澡。
> outlier_heatmap(met,
+ group_factor="tumor_normal",
+ label_colors=c("darkseagreen","dodgerblue"),
+ k=2)
根據(jù)上述熱圖,設(shè)置了k=2, 熱圖形成了cluster1和cluster2宛渐,cluster1相對cluster2便是異常值,我們將剔除cluster1眯搭。
> (met <- remove_cluster(met,cluster=1))
harmonizing input:
removing 5 sampleMap rows with 'colname' not in colnames of experiments
harmonizing input:
removing 5 sampleMap rows with 'colname' not in colnames of experiments
removing 5 colData rownames not in sampleMap 'primary'
A MultiAssayExperiment object of 2 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 2:
[1] raw: SummarizedExperiment with 307 rows and 81 columns
[2] imputed: SummarizedExperiment with 238 rows and 81 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
2.5 數(shù)據(jù)標(biāo)準(zhǔn)化
> (met <- normalize_met(met))
vsn2: 307 x 81 matrix (1 stratum).
Please use 'meanSdPlot' to verify the fit.
vsn2: 238 x 81 matrix (1 stratum).
Please use 'meanSdPlot' to verify the fit.
A MultiAssayExperiment object of 4 listed
experiments with user-defined names and respective classes.
Containing an ExperimentList class object of length 4:
[1] raw: SummarizedExperiment with 307 rows and 81 columns
[2] imputed: SummarizedExperiment with 238 rows and 81 columns
[3] norm: SummarizedExperiment with 307 rows and 81 columns
[4] norm_imputed: SummarizedExperiment with 238 rows and 81 columns
Features:
experiments() - obtain the ExperimentList instance
colData() - the primary/phenotype DataFrame
sampleMap() - the sample availability DataFrame
`$`, `[`, `[[` - extract colData columns, subset, or experiment
*Format() - convert into a long or wide DataFrame
assays() - convert ExperimentList to a SimpleList of matrices
2.6 數(shù)據(jù)標(biāo)準(zhǔn)化質(zhì)控
> quality_plot(met,
+ group_factor="tumor_normal",
+ label_colors=c("darkseagreen","dodgerblue"))
harmonizing input:
removing 243 sampleMap rows not in names(experiments)
harmonizing input:
removing 243 sampleMap rows not in names(experiments)
harmonizing input:
removing 243 sampleMap rows not in names(experiments)
harmonizing input:
removing 243 sampleMap rows not in names(experiments)
Warning messages:
1: Removed 5356 rows containing non-finite values (stat_boxplot).
2: Removed 5356 rows containing non-finite values (stat_boxplot).
3. 數(shù)據(jù)分析
3.1 無監(jiān)督分析
MetaboDiff包提供了線性降維方法PCA和非線性降維方法tSNE窥翩。
> source("http://peterhaschke.com/Code/multiplot.R")
> multiplot(
+ pca_plot(met,
+ group_factor="tumor_normal",
+ label_colors=c("darkseagreen","dodgerblue")),
+ tsne_plot(met,
+ group_factor="tumor_normal",
+ label_colors=c("darkseagreen","dodgerblue")),
+ cols=2)
sigma summary: Min. : 0.486945518988849 |1st Qu. : 0.714292832194587 |Median : 0.752934663223126 |Mean : 0.75914557339073 |3rd Qu. : 0.808081774279559 |Max. : 0.939549187337462 |
Epoch: Iteration #100 error is: 18.6145995899728
Epoch: Iteration #200 error is: 1.54407709770312
Epoch: Iteration #300 error is: 1.22290267643501
Epoch: Iteration #400 error is: 1.11106327484334
Epoch: Iteration #500 error is: 1.03658104678225
Epoch: Iteration #600 error is: 0.976566767973725
Epoch: Iteration #700 error is: 0.951849496540308
Epoch: Iteration #800 error is: 0.93612964053674
Epoch: Iteration #900 error is: 0.914421902208305
Epoch: Iteration #1000 error is: 0.88283039690459
3.2 假設(shè)檢驗(yàn)
對單個(gè)代謝物進(jìn)行差異分析,主要用T檢驗(yàn)和ANOVA分析鳞仙。
> met = diff_test(met,
+ group_factors = c("tumor_normal","random_gender"))
> str(metadata(met), max.level=2)
List of 2
$ ttest_tumor_normal_T_vs_N :'data.frame': 238 obs. of 3 variables:
..$ pval : num [1:238] 0.0206 0.7808 0.0832 0.0432 0.5859 ...
..$ adj_pval : num [1:238] 0.102 0.904 0.221 0.158 0.758 ...
..$ fold_change: num [1:238] 0.2872 0.0366 -0.3936 -0.5391 -0.1646 ...
$ ttest_random_gender_male_vs_female:'data.frame': 238 obs. of 3 variables:
..$ pval : num [1:238] 0.2318 0.8626 0.4048 0.0121 0.2111 ...
..$ adj_pval : num [1:238] 0.83 0.959 0.862 0.386 0.83 ...
..$ fold_change: num [1:238] -0.1372 -0.0208 0.1742 0.607 0.3438 ...
#以tumor和normal分組進(jìn)行差異分析
> volcano_plot(met,
+ group_factor="tumor_normal",
+ label_colors=c("darkseagreen","dodgerblue"),
+ p_adjust = FALSE)
> volcano_plot(met,
+ group_factor="tumor_normal",
+ label_colors=c("darkseagreen","dodgerblue"),
+ p_adjust = TRUE)
#以female和male分組進(jìn)行差異分析
> par(mfrow=c(1,2))
> volcano_plot(met,
+ group_factor="random_gender",
+ label_colors=c("brown","orange"),
+ p_adjust = FALSE)
> volcano_plot(met,
+ group_factor="random_gender",
+ label_colors=c("brown","orange"),
+ p_adjust = TRUE)
3.3 代謝物關(guān)聯(lián)網(wǎng)絡(luò)分析
相關(guān)分析被成功應(yīng)用在比較轉(zhuǎn)錄組分析中揭示具生物學(xué)意義的模塊的變化情況寇蚊。同樣是思路也可以應(yīng)用于代謝組數(shù)據(jù)分析中。
> met_example <- met_example %>%
+ diss_matrix %>% #構(gòu)建相異矩陣
+ identify_modules(min_module_size=5) %>% #鑒定代謝相關(guān)模塊
+ name_modules(pathway_annotation="SUB_PATHWAY") %>% #代謝相關(guān)模塊命名
+ calculate_MS(group_factors=c("tumor_normal","random_gender")) #根據(jù)樣本性狀計(jì)算模塊之間關(guān)聯(lián)的顯著性
alpha: 1.000000
..cutHeight not given, setting it to 0.991 ===> 99% of the (truncated) height range in dendro.
..done.
#代謝相關(guān)模塊可視化棍好,分級(jí)聚類
> WGCNA::plotDendroAndColors(metadata(met_example)$tree,
+ metadata(met_example)$module_color_vector,
+ 'Module colors',
+ dendroLabels = FALSE,
+ hang = 0.03,
+ addGuide = TRUE,
+ guideHang = 0.05, main='')
#代謝相關(guān)模塊可視化仗岸,各模塊直接的關(guān)系
> par(mar=c(2,2,2,2))
> ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree),
+ type = 'fan',
+ show.tip.label = FALSE,
+ main='')
> ape::tiplabels(frame = 'circle',
+ col='black',
+ text=rep('',length(unique(metadata(met_example)$modules))),
+ bg = WGCNA::labels2colors(0:21))
#代謝相關(guān)模塊命名,可視化
> ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree), cex=0.9)
#癌癥樣本和正常樣本對應(yīng)的模塊之間的關(guān)聯(lián)顯著性借笙,可視化
> MS_plot(met_example,
+ group_factor="tumor_normal",
+ p_value_cutoff=0.05,
+ p_adjust=FALSE)
#不同性別樣本對應(yīng)的模塊之間的關(guān)聯(lián)顯著性扒怖,可視化
> MS_plot(met_example,
+ group_factor="random_gender",
+ p_value_cutoff=0.05,
+ p_adjust=FALSE)
#相關(guān)模塊中單個(gè)代謝產(chǎn)物在不同樣品中的差異性檢驗(yàn)
> MOI_plot(met_example,
+ group_factor="tumor_normal",
+ MOI = 2,
+ label_colors=c("darkseagreen","dodgerblue"),
+ p_adjust = FALSE) + xlim(c(-1,8))