ggstatsplot
是ggplot2
包的擴(kuò)展鸽凶,主要用于繪制可發(fā)表的圖片同時(shí)標(biāo)注統(tǒng)計(jì)學(xué)分析結(jié)果仿野,其統(tǒng)計(jì)學(xué)分析結(jié)果包含統(tǒng)計(jì)分析的詳細(xì)信息线罕,該包對(duì)于經(jīng)常需要做統(tǒng)計(jì)分析的科研工作者來說非常有用屉佳。
ggstatsplot在統(tǒng)計(jì)學(xué)分析方面的優(yōu)勢(shì):
- 目前它支持最常見的統(tǒng)計(jì)測試類型:t-test / anova宛蚓,非參數(shù),相關(guān)性分析惑惶,列聯(lián)表分析和回歸分析煮盼。
- 在圖片輸出方面也表現(xiàn)出色:
(1)小提琴圖(用于不同組之間連續(xù)數(shù)據(jù)的異同分析);
(2)餅圖(用于分類數(shù)據(jù)的分布檢驗(yàn))带污;
(3)條形圖(用于分類數(shù)據(jù)的分布檢驗(yàn))僵控;
(4)散點(diǎn)圖(用于兩個(gè)變量之間的相關(guān)性分析);
(5)相關(guān)矩陣(用于多個(gè)變量之間的相關(guān)性分析)鱼冀;
(6)直方圖和點(diǎn)圖/圖表(關(guān)于分布的假設(shè)檢驗(yàn))报破;
(7)點(diǎn)須圖(用于回歸模型)。
以下是一些實(shí)用的例子:
ggbetweenstats函數(shù)
可創(chuàng)建小提琴圖千绪,箱線圖或兩者的混合泛烙,主要用于組間或不同條件之間的連續(xù)數(shù)據(jù)的比較, 最簡單的函數(shù)調(diào)用如下所示:
rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)
ggstatsplot::ggbetweenstats(
data = iris,
x = Species,
y = Sepal.Length,
messages = FALSE
) + # further modification outside of ggstatsplot
ggplot2::coord_cartesian(ylim = c(3, 8)) +
ggplot2::scale_y_continuous(breaks = seq(3, 8, by = 1))
結(jié)果如下圖所示:
如果在加載包的時(shí)候不同時(shí)加載
ggplot2
便會(huì)出現(xiàn)如下報(bào)錯(cuò):
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
polygon edge not found
從圖1我們可以看出不同種類的iris在 Sepal.Length上有顯著差異翘紊。但是其實(shí)我們可以修改參數(shù)蔽氨,讓其看起來更加富有信息。
rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)
# 去掉一列帆疟,舍棄anova檢測看是否有t-test的結(jié)果
iris2 <- dplyr::filter(.data = iris, Species != "setosa")
iris2$Species <-
base::factor(
x = iris2$Species,
levels = c("virginica", "versicolor")
)
# plot
ggstatsplot::ggbetweenstats(
data = iris2,
x = Species,
y = Sepal.Length,
notch = TRUE, # show notched box plot
mean.plotting = TRUE, # whether mean for each group is to be displayed
mean.ci = TRUE, # whether to display confidence interval for means
mean.label.size = 2.5, # size of the label for mean
type = "p", # which type of test is to be run
k = 3, # number of decimal places for statistical results
outlier.tagging = TRUE, # whether outliers need to be tagged
outlier.label = Sepal.Width, # variable to be used for the outlier tag
outlier.label.color = "darkgreen", # changing the color for the text label
xlab = "Type of Species", # label for the x-axis variable
ylab = "Attribute: Sepal Length", # label for the y-axis variable
title = "Dataset: Iris flower data set", # title text for the plot
ggtheme = ggthemes::theme_fivethirtyeight(), # choosing a different theme
ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer
package = "wesanderson", # package from which color palette is to be taken
palette = "Darjeeling1", # choosing a different color palette
messages = FALSE
)
ggbetweenstats函數(shù)
ggbetweenstats函數(shù)的功能幾乎與ggwithinstats相同鹉究。
rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)
ggstatsplot::ggwithinstats(
data = iris,
x = Species,
y = Sepal.Length,
messages = FALSE
)
# plot
ggstatsplot::ggwithinstats(
data = iris,
x = Species,
y = Sepal.Length,
sort = "descending", # ordering groups along the x-axis based on
sort.fun = median, # values of `y` variable
pairwise.comparisons = TRUE,
pairwise.display = "s",
pairwise.annotation = "p",
title = "iris",
caption = "Data from: iris",
ggtheme = ggthemes::theme_fivethirtyeight(),
ggstatsplot.layer = FALSE,
messages = FALSE
)
ggscatterstats函數(shù)
此函數(shù)使用ggExtra :: ggMarginal中的邊緣直方圖/箱線圖/密度/小提琴/ densigram圖創(chuàng)建散點(diǎn)圖,并在副標(biāo)題中顯示統(tǒng)計(jì)分析結(jié)果:
rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)
ggstatsplot::ggscatterstats(
data = ggplot2::msleep,
x = sleep_rem,
y = awake,
xlab = "REM sleep (in hours)",
ylab = "Amount of time spent awake (in hours)",
title = "Understanding mammalian sleep",
messages = FALSE
)
圖4表達(dá)的是
sleep_rem
與awake
存在相關(guān)性踪宠,其中X
軸為sleep_rem
自赔,Y
軸為awake
。該圖中右側(cè)和上方的直方圖代表的是數(shù)據(jù)的分布柳琢。該段數(shù)據(jù)越多绍妨,其柱子越高。
rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)
# plot
ggstatsplot::ggscatterstats(
data = dplyr::filter(.data = ggstatsplot::movies_long, genre == "Action"),
x = budget,
y = rating,
type = "robust", # type of test that needs to be run
conf.level = 0.99, # confidence level
xlab = "Movie budget (in million/ US$)", # label for x axis
ylab = "IMDB rating", # label for y axis
label.var = "title", # variable for labeling data points
label.expression = "rating < 5 & budget > 100", # expression that decides which points to label
line.color = "yellow", # changing regression line color line
title = "Movie budget and IMDB rating (action)", # title text for the plot
caption = expression( # caption text for the plot
paste(italic("Note"), ": IMDB stands for Internet Movie DataBase")
),
ggtheme = theme_bw(), # choosing a different theme
ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer
marginal.type = "density", # type of marginal distribution to be displayed
xfill = "#0072B2", # color fill for x-axis marginal distribution
yfill = "#009E73", # color fill for y-axis marginal distribution
xalpha = 0.6, # transparency for x-axis marginal distribution
yalpha = 0.6, # transparency for y-axis marginal distribution
centrality.para = "median", # central tendency lines to be displayed
messages = FALSE # turn off messages and notes
)
ggbarstats柱狀圖
ggbarstats函數(shù)主要用于展示不同組之間分類數(shù)據(jù)的分布問題柬脸。例如:A組患者中他去,男女的比例是否與B組患者中男女的比例存在異同。
rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
library(hrbrthemes)
set.seed(123)
# plot
ggstatsplot::ggbarstats(
data = ggstatsplot::movies_long,
main = mpaa,
condition = genre,
sampling.plan = "jointMulti",
title = "MPAA Ratings by Genre",
xlab = "movie genre",
perc.k = 1,
x.axis.orientation = "slant",
ggtheme = hrbrthemes::import_roboto_condensed(),
ggstatsplot.layer = FALSE,
ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text(face = "italic")),
palette = "Set2",
messages = FALSE
)
圖6倒堕,堆積柱狀圖:比較的是不同組之間灾测,分類數(shù)據(jù)的分布是否存在異同。同樣可以修改參數(shù)讓它顯得更加復(fù)雜和美觀垦巴。
ggtheme = hrbrthemes::import_roboto_condensed()
原始的參考文件不是這的而是ggtheme = hrbrthemes::theme_modern_rc()
所以需要先加載hrbrthemes
包媳搪,這個(gè)過程中容易出現(xiàn)報(bào)錯(cuò)
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
polygon edge not found
In addition: Warning message:
In grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
no font could be found for family "Roboto Condensed"
gghistostats
看一個(gè)變量的分布并通過一個(gè)樣本測試檢查它是否與指定值明顯有差異:
ggstatsplot::gghistostats(
data = ToothGrowth, # dataframe from which variable is to be taken
x = len, # numeric variable whose distribution is of interest
title = "Distribution of Sepal.Length", # title for the plot
fill.gradient = TRUE, # use color gradient
test.value = 10, # the comparison value for t-test
test.value.line = TRUE, # display a vertical line at test value
type = "bf", # bayes factor for one sample t-test
bf.prior = 0.8, # prior width for calculating the bayes factor
messages = FALSE # turn off the messages
)
ggdotplotstats
此函數(shù)類似于gghistostats,當(dāng)變量有數(shù)字標(biāo)簽是使用更佳骤宣。
set.seed(123)
# plot
ggdotplotstats(
data = dplyr::filter(.data = gapminder::gapminder, continent == "Asia"),
y = country,
x = lifeExp,
test.value = 55,
test.value.line = TRUE,
test.line.labeller = TRUE,
test.value.color = "red",
centrality.para = "median",
centrality.k = 0,
title = "Distribution of life expectancy in Asian continent",
xlab = "Life expectancy",
messages = FALSE,
caption = substitute(
paste(
italic("Source"),
": Gapminder dataset from https://www.gapminder.org/"
)
)
)
ggcorrmat
該函數(shù)主要用于變量之間的相關(guān)性分析:
set.seed(123)
# as a default this function outputs a correlalogram plot
ggstatsplot::ggcorrmat(
data = ggplot2::msleep,
corr.method = "robust", # correlation method
sig.level = 0.001, # threshold of significance
p.adjust.method = "holm", # p-value adjustment method for multiple comparisons
cor.vars = c(sleep_rem, awake:bodywt), # a range of variables can be selected
cor.vars.names = c(
"REM sleep", # variable names
"time awake",
"brain weight",
"body weight"
),
matrix.type = "upper", # type of visualization matrix
colors = c("#B2182B", "white", "#4D4D4D"),
title = "Correlalogram for mammals sleep dataset",
subtitle = "sleep units: hours; weight units: kilograms"
)
ggcoefstats
回歸分析森林圖展示點(diǎn)估計(jì)值帶有置信區(qū)間的點(diǎn):
set.seed(123)
# model
mod <- stats::lm(
formula = mpg ~ am * cyl,
data = mtcars
)
# plot
ggstatsplot::ggcoefstats(x = mod)
除了以上的用內(nèi)置數(shù)據(jù)完成的幾類繪圖秦爆,這個(gè)包還支持用其他包繪圖,同時(shí)用ggstatsplot包展示統(tǒng)計(jì)分析結(jié)果:
set.seed(123)
# loading the needed libraries
#install.packages("yarrr")
library(yarrr)
library(ggstatsplot)
# using `ggstatsplot` to get call with statistical results
stats_results <-
ggstatsplot::ggbetweenstats(
data = ChickWeight,
x = Time,
y = weight,
return = "subtitle",
messages = FALSE
)
# using `yarrr` to create plot
yarrr::pirateplot(
formula = weight ~ Time,
data = ChickWeight,
theme = 1,
main = stats_results
)
參考學(xué)習(xí)資料:
https://cloud.tencent.com/developer/article/1450100
https://github.com/IndrajeetPatil/ggstatsplot