上篇文章中提了一下如何通過ggpubr包為ggplot
圖添加p-value
以及顯著性標(biāo)記,本文將詳細(xì)介紹蕊玷。利用數(shù)據(jù)集ToothGrowth進(jìn)行演示
#先加載包
library(ggpubr)
#加載數(shù)據(jù)集ToothGrowth
data("ToothGrowth")
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
比較方法
R中常用的比較方法主要有下面幾種:
方法 | R函數(shù) | 描述 |
---|---|---|
T-test | t.test() | 比較兩組(參數(shù)) |
Wilcoxon test | wilcox.test() | 比較兩組(非參數(shù)) |
ANOVA | aov()或anova() | 比較多組(參數(shù)) |
Kruskal-Wallis | kruskal.test() | 比較多組(非參數(shù)) |
各種比較方法后續(xù)有時(shí)間一一講解猜煮。
添加p-value
主要利用ggpubr包中的兩個(gè)函數(shù):
-
compare_means()
:可以進(jìn)行一組或多組間的比較 -
stat_compare_mean()
:自動(dòng)添加p-value
逢渔、顯著性標(biāo)記到ggplot圖中
compare_means()函數(shù)
該函數(shù)主要用用法如下:
compare_means(formula, data, method = "wilcox.test", paired = FALSE,
group.by = NULL, ref.group = NULL, ...)
注釋:
-
formula
:形如x~group
讯赏,其中x是數(shù)值型變量,group
是因子棕叫,可以是一個(gè)或者多個(gè) -
data
:數(shù)據(jù)集 -
method
:比較的方法耿焊,默認(rèn)為"wilcox.test"
, 其他可選方法為:"t.test"
、"anova"
奕坟、"kruskal.test"
-
paired
:是否要進(jìn)行paired test
(TRUE
orFALSE
) -
group_by
: 比較時(shí)是否要進(jìn)行分組 -
ref.group
: 是否需要指定參考組
stat_compare_means()函數(shù)
主要用法:
stat_compare_means(mapping = NULL, comparisons = NULL hide.ns = FALSE,
label = NULL, label.x = NULL, label.y = NULL, ...)
注釋:
-
mapping
:由aes()
創(chuàng)建的一套美學(xué)映射 -
comparisons
:指定需要進(jìn)行比較以及添加p-value
谊囚、顯著性標(biāo)記的組 -
hide.ns
:是否要顯示顯著性標(biāo)記ns -
label
:顯著性標(biāo)記的類型,可選項(xiàng)為:p.signif
(顯著性標(biāo)記)执赡、p.format
(顯示p-value
) -
label.x
镰踏、label.y
:顯著性標(biāo)簽調(diào)整 - ...:其他參數(shù)
比較獨(dú)立的兩組
compare_means(len~supp, data=ToothGrowth)
結(jié)果解釋:
-
.y
:測(cè)試中使用的y變量 -
p
:p-value
-
p.adj
:調(diào)整后的p-value
。默認(rèn)為p.adjust.method="holm"
-
p.format
:四舍五入后的p-value
-
p.signif
:顯著性水平 -
method
:用于統(tǒng)計(jì)檢驗(yàn)的方法
繪制箱線圖
p <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp",
palette = "jco", add = "jitter")#添加p-valuep+stat_compare_means()
#使用其他統(tǒng)計(jì)檢驗(yàn)方法
p+stat_compare_means(method = "t.test")
上述顯著性標(biāo)記可以通過
label.x
沙合、label.y
奠伪、hjust
及vjust
來調(diào)整顯著性標(biāo)記可以通過
aes()
映射來更改:
-
aes(label=..p.format..)
或aes(lebel=paste0("p=",..p.format..))
:只顯示p-value
,不顯示統(tǒng)計(jì)檢驗(yàn)方法 -
aes(label=..p.signif..)
:僅顯示顯著性水平 -
aes(label=paste0(..method..,"\n", "p=",..p.format..))
:p-value
與顯著性水平分行顯示
舉個(gè)栗子:
p+stat_compare_means(aes(label=..p.signif..), label.x = 1.5, label.y = 40)
也可以將標(biāo)簽指定為字符向量首懈,不要映射绊率,只需將p.signif兩端的..去掉即可
p+stat_compare_means(label = "p.signif", label.x = 1.5, label.y = 40)
比較兩個(gè)paired sample
compare_means(len~supp, data=ToothGrowth, paired = TRUE)
利用ggpaired()進(jìn)行可視化
ggpaired(ToothGrowth, x="supp", y="len", color = "supp", line.color = "gray",
line.size = 0.4, palette = "jco")+ stat_compare_means(paired = TRUE)
多組比較
Global test
compare_means(len~dose, data=ToothGrowth, method = "anova")
可視化
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+
stat_compare_means()
#使用其他的方法
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+
stat_compare_means(method = "anova")
Pairwise comparisons:如果分組變量中包含兩個(gè)以上的水平,那么會(huì)自動(dòng)進(jìn)行pairwise test,默認(rèn)方法為"wilcox.test"
compare_means(len~dose, data=ToothGrowth)
#可以指定比較哪些組
my_comparisons <- list(c("0.5", "1"), c("1", "2"), c("0.5", "2"))
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+
stat_compare_means(comparisons=my_comparisons)+ # Add pairwise
comparisons p-value stat_compare_means(label.y = 50) # Add global p-value
可以通過修改參數(shù)label.y來更改標(biāo)簽的位置
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+
stat_compare_means(comparisons=my_comparisons, label.y = c(29, 35, 40))+ # Add pairwise comparisons p-value
stat_compare_means(label.y = 45) # Add global p-value
至于通過添加線條來連接比較的兩組究履,這一功能已由包ggsignif實(shí)現(xiàn)
##設(shè)定參考組
compare_means(len~dose, data=ToothGrowth, ref.group = "0.5", #以dose=0.5組為參考組
method = "t.test" )
#可視化
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+
stat_compare_means(method = "anova", label.y = 40)+ # Add global p-value
stat_compare_means(label = "p.signif", method = "t.test", ref.group = "0.5") # Pairwise comparison against reference
參考組也可以設(shè)置為.all.即所有的平均值
compare_means(len~dose, data=ToothGrowth, ref.group = ".all.", method = "t.test")
#可視化
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+
stat_compare_means(method = "anova", label.y = 40)+# Add global p-value
stat_compare_means(label = "p.signif", method = "t.test",
ref.group = ".all.")#Pairwise comparison against all
接下來利用survminer包中的數(shù)據(jù)集myeloma來講解一下為什么有時(shí)候我們需要將ref.group設(shè)置為.all.
library(survminer)#沒安裝的先安裝再加載
data("myeloma")
head(myeloma)
我們將根據(jù)患者的分組來繪制DEPDC1基因的表達(dá)譜滤否,看不同組之間是否存在顯著性的差異,我們可以在7組之間進(jìn)行比較最仑,但是這樣的話組間比較的組合就太多了藐俺,因此我們可以將7組中每一組與全部平均值進(jìn)行比較炊甲,看看DEPDC1基因在不同的組中是否過表達(dá)還是低表達(dá)。
compare_means(DEPDC1~molecular_group, data = myeloma, ref.group = ".all.", method = "t.test")
#可視化DEPDC1基因表達(dá)譜
ggboxplot(myeloma, x="molecular_group", y="DEPDC1",
color = "molecular_group", add = "jitter", legend="none")+
rotate_x_text(angle = 45)+
geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean
stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value
stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.")# Pairwise comparison against all
從圖中可以看出欲芹,DEPDC1基因在Proliferation組中顯著性地過表達(dá)卿啡,而在Hyperdiploid和Low bone disease顯著性地低表達(dá)
我們也可以將非顯著性標(biāo)記ns去掉,只需要將參數(shù)hide.ns=TRUE
ggboxplot(myeloma, x="molecular_group", y="DEPDC1",
color = "molecular_group", add = "jitter", legend="none")+
rotate_x_text(angle = 45)+
geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean
stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value
stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.", hide.ns = TRUE)# Pairwise comparison against all
多個(gè)分組變量
按另一個(gè)變量進(jìn)行分組之后進(jìn)行統(tǒng)計(jì)檢驗(yàn)菱父,比如按變量dose進(jìn)行分組:
compare_means(len~supp, data=ToothGrowth, group.by = "dose")
#可視化
p <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp",
palette = "jco", add = "jitter", facet.by = "dose", short.panel.labs = FALSE)#按dose進(jìn)行分面
#label只繪制
p-valuep+stat_compare_means(label = "p.format")
#label繪制顯著性水平
p+stat_compare_means(label = "p.signif", label.x = 1.5)
#將所有箱線圖繪制在一個(gè)panel中
p <- ggboxplot(ToothGrowth, x="dose", y="len", color = "supp",
palette = "jco", add = "jitter")
p+stat_compare_means(aes(group=supp))
#只顯示p-value
p+stat_compare_means(aes(group=supp), label = "p.format")
#顯示顯著性水平
p+stat_compare_means(aes(group=supp), label = "p.signif")
進(jìn)行paired sample檢驗(yàn)
compare_means(len~supp, data=ToothGrowth, group.by = "dose", paired = TRUE)
#可視化
p <- ggpaired(ToothGrowth, x="supp", y="len", color = "supp",
palette = "jco", line.color="gray", line.size=0.4, facet.by = "dose",
short.panel.labs = FALSE)#按dose分面
#只顯示p-value
p+stat_compare_means(label = "p.format", paired = TRUE)
其他圖形
條形圖與線圖(一個(gè)分組變量)
#有誤差棒的條形圖颈娜,實(shí)際上我以前的文章里有純粹用ggplot2實(shí)現(xiàn)
ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se")+
stat_compare_means()+
stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))
#有誤差棒的線圖
ggline(ToothGrowth, x="dose", y="len", add = "mean_se")+
stat_compare_means()+
stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))
條形圖與線圖(兩個(gè)分組變量)
ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp",
palette = "jco", position = position_dodge(0.8))+
stat_compare_means(aes(group=supp), label = "p.signif", label.y = 29)
ggline(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp",
palette = "jco")+
stat_compare_means(aes(group=supp), label = "p.signif", label.y = c(16, 25, 29))
Sessioninfo
sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8.1 x64 (build 9600)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936
## [2] LC_CTYPE=Chinese (Simplified)_China.936
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_China.936
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] survminer_0.4.0 ggpubr_0.1.3 magrittr_1.5 ggplot2_2.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.11 compiler_3.4.0 plyr_1.8.4
## [4] tools_3.4.0 digest_0.6.12 evaluate_0.10
## [7] tibble_1.3.3 gtable_0.2.0 nlme_3.1-131
## [10] lattice_0.20-35 rlang_0.1.1 Matrix_1.2-10
## [13] psych_1.7.5 ggsci_2.4 DBI_0.6-1
## [16] cmprsk_2.2-7 yaml_2.1.14 parallel_3.4.0
## [19] gridExtra_2.2.1 dplyr_0.5.0 stringr_1.2.0
## [22] knitr_1.16 survMisc_0.5.4 rprojroot_1.2
## [25] grid_3.4.0 data.table_1.10.4 KMsurv_0.1-5
## [28] R6_2.2.1 km.ci_0.5-2 survival_2.41-3
## [31] foreign_0.8-68 rmarkdown_1.5 reshape2_1.4.2
## [34] tidyr_0.6.3 purrr_0.2.2.2 splines_3.4.0
## [37] backports_1.1.0 scales_0.4.1 htmltools_0.3.6
## [40] assertthat_0.2.0 mnormt_1.5-5 xtable_1.8-2
## [43] colorspace_1.3-2 ggsignif_0.2.0 labeling_0.3
## [46] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3
## [49] broom_0.4.2 zoo_1.8-0