導(dǎo)讀
通路富集分析技術(shù)對于理解代謝組學(xué)數(shù)據(jù)背后的潛在生物學(xué)意義是非常有用的枣购,它們的目的是根據(jù)代謝途徑中所包含的先驗知識,為受影響的代謝物提供上下游關(guān)系雄妥。然而培己,對廣義代謝通路的解釋仍然具有挑戰(zhàn)性,因為路徑之間會有重疊和交叉钧忽。
文獻(xiàn)介紹
- 原標(biāo)題:FELLA: an R package to enrich metabolomics data
- 譯名:FELLA某抓,一個代謝組學(xué)富集分析的R包
- 期刊:《BMC Bioinformatics》
- 作者:Sergio Picart-Armada(一作),Alexandre Perera Lluna(通訊作者)
- 實驗室主頁:B2SLab
- 單位:加泰羅尼亞理工大學(xué)等
- 領(lǐng)域: 混合生物信息學(xué)&生物工程學(xué)惰瓜、心血管疾病否副、代謝組學(xué)數(shù)據(jù)處理、軟件開發(fā)應(yīng)用
主要成果
本篇文章主要介紹了一個R包
FELLA
崎坊,基于前期分析得到的差異代謝物來構(gòu)建基于網(wǎng)絡(luò)的富集分析备禀。結(jié)果包括代謝通路、模塊奈揍、酶曲尸、反應(yīng)及代謝物。那么除了能夠提供通路列表男翰,F(xiàn)ELLA還能夠生成輸入代謝物相關(guān)的中間物質(zhì)(如模塊另患、酶、反應(yīng))蛾绎±セ可以反映特定研究條件下代謝通路之間的交集以及靶向潛在的酶和代謝物鸦列。
工作流程
下面這幅圖高度概括了該軟件的一個使用流程
- Block Ⅰ:本地數(shù)據(jù)庫
- Block Ⅱ:富集分析
- Block Ⅲ:結(jié)果導(dǎo)出
那么FELLA同時通過
shiny
包又具備了可交互的工作模式
包的下載及演示
包的下載
# 該包位于bioinformatics網(wǎng)站上
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("FELLA", version = "3.8")
library(FELLA) ##加載包
加載數(shù)據(jù)庫
# 第一部分就是創(chuàng)建數(shù)據(jù)庫纤泵,這里加載已經(jīng)創(chuàng)建好的
data("FELLA.sample")
class(FELLA.sample)
## [1] "FELLA.DATA"
## attr(,"package")
## [1] "FELLA"
show(FELLA.sample)
## General data:
## - KEGG graph:
## * Nodes: 670
## * Edges: 1677
## * Density: 0.003741383
## * Categories:
## + pathway [2]
## + module [6]
## + enzyme [58]
## + reaction [279]
## + compound [325]
## * Size: 366.9 Kb
## - KEGG names are ready.
## -----------------------------
## Hypergeometric test:
## - Matrix is ready
## * Dim: 325 x 2
## * Size: 25 Kb
## -----------------------------
## Heat diffusion:
## - Matrix not loaded.
## - RowSums are ready.
## -----------------------------
## PageRank:
## - Matrix not loaded.
## - RowSums are ready.
- 這里需要注意的是骆姐,
FELLA.DATA
只需要通過函數(shù)buildGraphFromKEGGREST
和buildDataFromGraph
創(chuàng)建一次即可,并且后期不允許認(rèn)為修改
加載演示數(shù)據(jù)
# 第二部分就是加載數(shù)據(jù)集捏题,也就是前面說的輸入分析得到對結(jié)果有影響的代謝物list
data("input.sample")
input.full <- c(input.sample, paste0("intruder", 1:10))
show(input.full)
## [1] "C00143" "C00546" "C04225" "C16328" "C00091"
## [6] "C15979" "C16333" "C05264" "C05258" "C00011"
## [11] "C00083" "C00044" "C05266" "C00479" "C05280"
## [16] "C01352" "C05268" "C16329" "C00334" "C05275"
## [21] "C14145" "C00081" "C04253" "C00027" "C00111"
## [26] "C00332" "C00003" "C00288" "C05467" "C00164"
## [31] "intruder1" "intruder2" "intruder3" "intruder4" "intruder5"
## [36] "intruder6" "intruder7" "intruder8" "intruder9" "intruder10"
# 下面就是通過函數(shù)`defineCompounds`來看下有哪些物質(zhì)是與數(shù)據(jù)庫匹配上的
myAnalysis <- defineCompounds(
compounds = input.full,
data = FELLA.sample)
# 要注意的是有些你前期分析鑒定出的化合物可能并不一定能比對上KEGG數(shù)據(jù)庫收集的化合物玻褪,那么這些比對失敗的化合物就需要通過函數(shù)`getExcluded`排除,而比對上的代謝物用`getInput`函數(shù)
getInput(myAnalysis)
## [1] "C00003" "C00011" "C00027" "C00044" "C00081" "C00083" "C00091" "C00111"
## [9] "C00143" "C00164" "C00288" "C00332" "C00334" "C00479" "C00546" "C01352"
## [17] "C04225" "C04253" "C05258" "C05264" "C05266" "C05268" "C05275" "C05280"
## [25] "C05467" "C14145" "C15979" "C16328" "C16329" "C16333"
getExcluded(myAnalysis)
## [1] "intruder1" "intruder2" "intruder3" "intruder4" "intruder5"
## [6] "intruder6" "intruder7" "intruder8" "intruder9" "intruder10"
- 需要注意的是公荧,這里是準(zhǔn)確匹配的方式归园,所以要特別小心有空格或者tab鍵。
具體分析
-
接下來就是富集分析:一旦FELLA.DATA和FELLA.USER確定下來稚矿,那么就可以很輕松的開始下一步的富集分析流程了庸诱,富集分析的方法有三種
- 超幾何檢驗(method = "hypergeom")
- Diffusion(分析有意義子網(wǎng)絡(luò))
- PageRank(和Diffusion類似,只不過會對網(wǎng)絡(luò)進(jìn)行排序)
-
統(tǒng)計分析:對于前面
Diffusion
和PageRank
方法晤揣,提供了兩種統(tǒng)計方法- Normal approximation(approx = "normality")桥爽,基于無效假設(shè)的分析的期望值和協(xié)方差矩陣的z-score計算得到得分值
- Monte Carlo trials(approx = "simulation"),隨機(jī)變量的蒙特卡羅實驗計算得分值
-
富集:方法昧识、近似值和集成方法
-
enrich
函數(shù)包括前面的defineCompounds
钠四,runHypergeom
,runDiffusion
和runPagerank
四種函數(shù)跪楞。一步分析法
-
myAnalysis <- enrich(
compounds = input.full,
method = listMethods(),
approx = "normality",
data = FELLA.sample)
#No background compounds specified. Default background will be used.
#Running hypergeom...
#Starting hypergeometric p-values calculation...
#Done.
#Running diffusion...
#Computing p-scores through the specified distribution.
#Done.
#Running PageRank...
#Computing p-scores through the specified distribution.
#Using provided damping factor...
#Done.
#Warning message:
#In defineCompounds(compounds = compounds, compoundsBackground = compoundsBackground, :
# Some compounds were introduced as affected but they do not belong to the background. These compounds will be excluded from the analysis. Use 'getExcluded' # to see them.
show(myAnalysis)
## Compounds in the input: 30
## [1] "C00003" "C00011" "C00027" "C00044" "C00081" "C00083" "C00091" "C00111"
## [9] "C00143" "C00164" "C00288" "C00332" "C00334" "C00479" "C00546" "C01352"
## [17] "C04225" "C04253" "C05258" "C05264" "C05266" "C05268" "C05275" "C05280"
## [25] "C05467" "C14145" "C15979" "C16328" "C16329" "C16333"
## Background compounds: all available compounds (default)
## -----------------------------
## Hypergeometric test: ready.
## Top 2 p-values:
## hsa00640 hsa00010
## 8.540386e-09 9.999888e-01
##
## -----------------------------
## Heat diffusion: ready.
## P-scores under 0.05: 86
## -----------------------------
## PageRank: ready.
## P-scores under 0.05: 70
可視化
- 在
method = "hypergeom"
參數(shù)下畫的圖是包含top通路以及其對應(yīng)的代謝物的圖
plot(
x = myAnalysis,
method = "hypergeom",
main = "My first enrichment using the hypergeometric test in FELLA",
threshold = 1,
data = FELLA.sample)
- 在
method = "diffusion"
參數(shù)下畫的圖是包含模塊缀去、酶和生化反應(yīng)途徑
plot(
x = myAnalysis,
method = "diffusion",
main = "My first enrichment using the diffusion analysis in FELLA",
threshold = 0.1,
data = FELLA.sample)
- 在
method = "pagerank"
參數(shù)下畫的圖和diffusion類似
plot(
x = myAnalysis,
method = "diffusion",
main = "My first enrichment using the diffusion analysis in FELLA",
threshold = 0.1,
data = FELLA.sample)
導(dǎo)出結(jié)果
- 將數(shù)據(jù)(代謝通路注釋的結(jié)果導(dǎo)出)
myTempDir <- getwd()
myExp_csv <- paste0(myTempDir, "/table.csv")
exportResults(
format = "csv",
file = myExp_csv,
method = "pagerank",
threshold = 0.1,
object = myAnalysis,
data = FELLA.sample)
小結(jié)
那么整個關(guān)于FELLA軟件的一般性使用方法就介紹到這里,當(dāng)然軟件背后的計算方法是需要更加細(xì)致的去學(xué)習(xí)和探究的甸祭。區(qū)別于網(wǎng)頁分析軟件Metaboanalyst
缕碎,可以更加快速和不依賴于網(wǎng)絡(luò)的限制,這就是我為什么更喜歡用軟件的原因池户。
參考
[1] 文章鏈接:FELLA: an R package to enrich metabolomics data
[2] FELLA包鏈接1:FELLA
[3] FELLA包github網(wǎng)址鏈接:github