對(duì)于Lefse來(lái)說(shuō)藻雌,首先測(cè)試所有特征在不同類(lèi)中的值是否存在差異分布孽拷。其次吨掌,對(duì)顯著不同的特征進(jìn)行事后檢驗(yàn),不同類(lèi)中子類(lèi)之間的所有兩兩比較應(yīng)都明顯符合類(lèi)趨勢(shì)脓恕。最后膜宋,利用線(xiàn)性判別分析(LDA)或隨機(jī)森林(RF)對(duì)顯著判別特征進(jìn)行評(píng)估。
與Lefse不同炼幔,在MicrobiotaProcess中秋茫,
①省去Lefse分析前整理數(shù)據(jù)的步驟,可以直接使用physeq對(duì)象進(jìn)行操作乃秀,節(jié)省大量時(shí)間
②每一步中使用的檢驗(yàn)方法都可以自定義:
第一步中可以是oneway.test肛著,也可以是kruskal_test
第二步中可以是wilcox.test,也可以是lm
第三步可以是lda跺讯,也可以是rf
③Lefse中由于LDA效應(yīng)大小是通過(guò)隨機(jī)重采樣計(jì)算的枢贿,因此不能保證結(jié)果完全可以重現(xiàn),MicrobiotaProcess中引入隨機(jī)種子設(shè)置以實(shí)現(xiàn)結(jié)果的重復(fù)性
④第一步檢驗(yàn)中提供p值校正策略刀脏,過(guò)濾方法也可以選擇pvalue或是padjust萨咕。
library(coin) # for the kruskal_test and wilcox_test
library(MicrobiotaProcess)
library(phyloseq)
load("1.rdata")#加載physeq對(duì)象
set.seed(1024)#由于LDA效應(yīng)大小是通過(guò)隨機(jī)重采樣計(jì)算的,因此應(yīng)設(shè)置隨機(jī)種子以實(shí)現(xiàn)結(jié)果的重復(fù)性
deres <- diff_analysis(obj = physeq,
classgroup = "Category",#分組
firstcomfun = "kruskal_test",
padjust = "fdr",#p值校正方法
filtermod = "pvalue",#以pvalue列過(guò)濾
firstalpha = 0.05,
strictmod = TRUE,#是否進(jìn)行一對(duì)一事后檢驗(yàn)
secondcomfun = "wilcox_test",
secondalpha = 0.01,
mlfun = "lda",#線(xiàn)性判別分析火本,可選隨機(jī)森林
ldascore=3#線(xiàn)性判別分?jǐn)?shù)
)
deres
# The original data: 253 features and 14 samples
# The sample data: 1 variables and 14 samples
# The taxda contained 664 by 7 rank
# after first test (kruskal_test) number of feature (pvalue<=0.05):61
# after second test (wilcox_test and generalizedFC) number of significantly discriminative feature:31
# after lda, Number of discriminative features: 19 (certain taxonomy classification:16; uncertain taxonomy classication: 3)
ggdiffclade畫(huà)優(yōu)勢(shì)物種進(jìn)化分支圖
#visualization of different results by ggdiffclade
ggdiffclade(obj=deres,
layout="circular",#布局類(lèi)型
alpha=0.3, #樹(shù)分支的背景透明度
linewd=0.2, #樹(shù)中連線(xiàn)粗細(xì)
skpointsize=0.8, #樹(shù)骨架中國(guó)點(diǎn)的大小
taxlevel=2, #要展示的樹(shù)分支水平2(門(mén))及3以下(綱目科屬種)
cladetext=4,#文本大小
setColors=F,#自定義顏色
removeUnknown=T,#不刪分支危队,但移除分類(lèi)中有un_的物種注釋
reduce=T)+ # 移除分類(lèi)不明的物種分支和其注釋?zhuān)瑸門(mén)時(shí)removeUnknown參數(shù)無(wú)效
scale_fill_manual(values=c("#00AED7", "#FD9347")) +
guides(color = guide_legend(keywidth = 0.1, keyheight = 0.6,order = 3,ncol=1)) +
theme(panel.background=element_rect(fill=NA),
legend.position="right",
plot.margin=margin(0,0,0,0),
legend.spacing.y=unit(0.02, "cm"),
legend.title=element_text(size=12),
legend.text=element_text(size=10),
legend.box.spacing=unit(0.02,"cm"))
ggdiffbox畫(huà)LDA效應(yīng)圖帶物種豐度箱線(xiàn)圖
# visualization of different results by ggdiffbox
diffbox <- ggdiffbox(obj=deres, box_notch=FALSE,
colorlist=c("#00AED7", "#FD9347"),
l_xlabtext="relative abundance")
diffbox
ggeffectsize放大LDA效應(yīng)圖
effectsize <- ggeffectsize(obj=deres, lineheight=0.1,linewidth=0.3,pointsize=3) +
scale_color_manual(values=c("#00AED7", "#FD9347"))
effectsize
ggdifftaxbar可視化每個(gè)LDA顯著性物種在樣本的分布情況
ggdifftaxbar(obj=deres, xtextsize=1.5, output="each_biomarkder_barplot",coloslist=c("#00AED7", "#FD9347"))
PS:MicrobiotaProcess新版本已全面支持MPSE流操作,但mp_diff_analysis函數(shù)同樣參數(shù)計(jì)算出來(lái)有OTU而不只是物種等級(jí)钙畔,盡管具有物種等級(jí)數(shù)目?jī)蓚€(gè)函數(shù)都是19個(gè)茫陆,示例如下:
sss <- as.MPSE(physeq)
sss %<>% mp_rrarefy()
sss %<>% mp_cal_abundance(.abundance = RareAbundance,force=T) %>% mp_cal_abundance(.abundance=RareAbundance,.group=Category,force = T)
sss %<>% mp_diff_analysis(
.abundance = RelRareAbundanceBySample,
.group = Category,
p.adjust="fdr",
filter.p="pvalue",
first.test.alpha = 0.05,
second.test.alpha = 0.01,
ldascore = 3
)
taxa.tree <- sss %>% mp_extract_tree(type="taxatree")
taxa.tree %>% select(label, nodeClass, LDAupper, LDAmean, LDAlower, Sign_Category, pvalue, fdr) %>% filter(!is.na(fdr))
#從第八行開(kāi)始
# # A tibble: 26 x 8
# label nodeClass LDAupper LDAmean LDAlower Sign_Category pvalue fdr
# <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
# 1 OTU_80 OTU 3.32 3.28 3.23 M 0.00671 0.0975
# 2 OTU_17 OTU 4.14 4.12 4.10 N 0.00195 0.0759
# 3 OTU_184 OTU 3.16 3.12 3.06 N 0.00298 0.0759
# 4 OTU_4 OTU 4.06 4.03 3.99 M 0.00671 0.0975
# 5 OTU_24 OTU 3.93 3.90 3.88 M 0.00671 0.0975
# 6 OTU_289 OTU 3.76 3.70 3.62 M 0.00192 0.0759
# 7 OTU_71 OTU 3.68 3.62 3.54 N 0.00671 0.0975
# 8 p__ Bacteroidetes Phylum 5.15 5.13 5.11 M 0.00451 0.0856
# 9 p__ Firmicutes Phylum 4.99 4.97 4.94 N 0.00451 0.0856
# 10 c__ Bacteroidia Class 5.15 5.13 5.11 M 0.00451 0.0856
# # ... with 16 more rows
其次隨后的mp_plot_diff_res畫(huà)圖功能還有所欠缺,尤其是cladogram中無(wú)法有效去除冗余物種以及顯示LDA差異物種具體等級(jí)信息擎析,如下:
#mp_plot_diff_res函數(shù)以分步呈現(xiàn)
#plot treeskeleton
p1 <- ggtree(taxa.tree,layout="radial",size = 0.3) +
geom_point(
data = td_filter(!isTip),
fill="white",
size=1,
shape=21)
# display the high light of phylum clade.
p2 <- p1 +
geom_hilight(mapping = aes(subset = nodeClass == "Phylum",
node = node,
fill = label))
# display the relative abundance of features(OTU)
#按樣本
p3 <- p2+
ggnewscale::new_scale("fill") +
geom_fruit(
data = td_unnest(RareAbundanceBySample),
geom = geom_star,
mapping = aes(x = fct_reorder(Sample, Category, .fun=min),
size = RelRareAbundanceBySample,
fill = Category,
subset = RelRareAbundanceBySample > 0.05),
starshape = 13,
starstroke = 0.25,
offset = 0.04,
pwidth = 0.8,
grid.params = list(linetype=2)) +
scale_size_continuous(name="Relative Abundance (%)",range = c(1, 3)) +
scale_fill_manual(values=c("#1B9E77", "#D95F02"))
p3
# display the tip labels of taxa tree
p4 <- p3 +
geom_tiplab(size=2, offset=7.5)
p4
# display the LDA of significant OTU.
p5 <- p4 +
ggnewscale::new_scale("fill") +
geom_fruit(
geom = geom_col,
mapping = aes(x = LDAmean,
fill = Sign_Category),
orientation = "y",
offset = 0.4,
pwidth = 0.5,
axis.params = list(axis = "x",
title = "Log10(LDA)",
title.height = 0.01,
title.size = 2,
text.size = 1.8,
vjust = 1),
grid.params = list(linetype = 2)
)
p5
# display the significant (FDR) taxonomy after kruskal.test (default)
p6 <- p5 +
ggnewscale::new_scale("size") +
geom_point(
data=td_filter(!is.na(fdr)),
mapping = aes(size = -log10(pvalue),
fill = Sign_Category),
shape = 21) +
scale_size_continuous(range=c(1, 3)) +
scale_fill_manual(values=c("#1B9E77", "#D95F02"))
p6 + theme(
legend.key.height = unit(0.3, "cm"),
legend.key.width = unit(0.3, "cm"),
legend.spacing.y = unit(0.02, "cm"),
legend.text = element_text(size = 7),
legend.title = element_text(size = 9),
)
#修改p3和p4簿盅,其余不變
# display the relative abundance of features(OTU)
# 按分組
p3 <- p2+
ggnewscale::new_scale("fill") +
geom_fruit(
data = td_unnest(RareAbundanceByCategory),
geom = geom_star,
mapping = aes(x=Category,
size = RelRareAbundanceByCategory,
fill = Category,
subset = RelRareAbundanceByCategory > 0.05),
starshape = 13,
starstroke = 0.25,
offset = 0.04,
pwidth = 0.1,
grid.params = list(linetype=2)) +
scale_size_continuous(name="Relative Abundance (%)",range = c(1, 3)) +
scale_fill_manual(values=c("#1B9E77", "#D95F02"))
p3
# display the tip labels of taxa tree
p4 <- p3 +
geom_tiplab(size=2, offset=2,mapping = aes(subset=!is.na(fdr))
p4
目前來(lái)看,后一個(gè)mp_diff_analysis函數(shù)所提供的的畫(huà)圖功能有所不足揍魂,其進(jìn)化分支圖主要展示的為OTU桨醋,這也解釋了其結(jié)果中為什么會(huì)出現(xiàn)OTU,而不是僅有物種等級(jí)现斋,推薦使用傳統(tǒng)diff_analysis方法喜最。