可視化神器ggstatsplot = 繪圖+統(tǒng)計(jì)

ggstatsplot是ggplot2包的擴(kuò)展鸽凶，主要用于繪制可發(fā)表的圖片同時(shí)標(biāo)注統(tǒng)計(jì)學(xué)分析結(jié)果仿野，其統(tǒng)計(jì)學(xué)分析結(jié)果包含統(tǒng)計(jì)分析的詳細(xì)信息线罕，該包對(duì)于經(jīng)常需要做統(tǒng)計(jì)分析的科研工作者來說非常有用屉佳。
ggstatsplot在統(tǒng)計(jì)學(xué)分析方面的優(yōu)勢(shì)：

目前它支持最常見的統(tǒng)計(jì)測試類型：t-test / anova宛蚓，非參數(shù)，相關(guān)性分析惑惶，列聯(lián)表分析和回歸分析煮盼。
在圖片輸出方面也表現(xiàn)出色：
（1）小提琴圖（用于不同組之間連續(xù)數(shù)據(jù)的異同分析）；
（2）餅圖（用于分類數(shù)據(jù)的分布檢驗(yàn)）带污；
（3）條形圖（用于分類數(shù)據(jù)的分布檢驗(yàn)）僵控；
（4）散點(diǎn)圖（用于兩個(gè)變量之間的相關(guān)性分析）；
（5）相關(guān)矩陣（用于多個(gè)變量之間的相關(guān)性分析）鱼冀；
（6）直方圖和點(diǎn)圖/圖表（關(guān)于分布的假設(shè)檢驗(yàn)）报破；
（7）點(diǎn)須圖（用于回歸模型）。

以下是一些實(shí)用的例子：

ggbetweenstats函數(shù)

可創(chuàng)建小提琴圖千绪，箱線圖或兩者的混合泛烙，主要用于組間或不同條件之間的連續(xù)數(shù)據(jù)的比較，最簡單的函數(shù)調(diào)用如下所示：

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)

ggstatsplot::ggbetweenstats(
  data = iris,
  x = Species,
  y = Sepal.Length,
  messages = FALSE
) + # further modification outside of ggstatsplot
  ggplot2::coord_cartesian(ylim = c(3, 8)) +
  ggplot2::scale_y_continuous(breaks = seq(3, 8, by = 1))

結(jié)果如下圖所示：

圖1

如果在加載包的時(shí)候不同時(shí)加載ggplot2
便會(huì)出現(xiàn)如下報(bào)錯(cuò)：

Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  : 
  polygon edge not found

從圖1我們可以看出不同種類的iris在 Sepal.Length上有顯著差異翘紊。但是其實(shí)我們可以修改參數(shù)蔽氨，讓其看起來更加富有信息。

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)
# 去掉一列帆疟，舍棄anova檢測看是否有t-test的結(jié)果
iris2 <- dplyr::filter(.data = iris, Species != "setosa")

iris2$Species <-
  base::factor(
    x = iris2$Species,
    levels = c("virginica", "versicolor")
  )
# plot
ggstatsplot::ggbetweenstats(
  data = iris2,
  x = Species,
  y = Sepal.Length,
  notch = TRUE, # show notched box plot
  mean.plotting = TRUE, # whether mean for each group is to be displayed
  mean.ci = TRUE, # whether to display confidence interval for means
  mean.label.size = 2.5, # size of the label for mean
  type = "p", # which type of test is to be run
  k = 3, # number of decimal places for statistical results
  outlier.tagging = TRUE, # whether outliers need to be tagged
  outlier.label = Sepal.Width, # variable to be used for the outlier tag
  outlier.label.color = "darkgreen", # changing the color for the text label
  xlab = "Type of Species", # label for the x-axis variable
  ylab = "Attribute: Sepal Length", # label for the y-axis variable
  title = "Dataset: Iris flower data set", # title text for the plot
  ggtheme = ggthemes::theme_fivethirtyeight(), # choosing a different theme
  ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer
  package = "wesanderson", # package from which color palette is to be taken
  palette = "Darjeeling1", # choosing a different color palette
  messages = FALSE
)

圖2

ggbetweenstats函數(shù)

ggbetweenstats函數(shù)的功能幾乎與ggwithinstats相同鹉究。

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)

ggstatsplot::ggwithinstats(
  data = iris,
  x = Species,
  y = Sepal.Length,
  messages = FALSE
)

圖3

# plot
ggstatsplot::ggwithinstats(
  data = iris,
  x = Species,
  y = Sepal.Length,
  sort = "descending", # ordering groups along the x-axis based on
  sort.fun = median, # values of `y` variable
  pairwise.comparisons = TRUE,
  pairwise.display = "s",
  pairwise.annotation = "p",
  title = "iris",
  caption = "Data from: iris",
  ggtheme = ggthemes::theme_fivethirtyeight(),
  ggstatsplot.layer = FALSE,
  messages = FALSE
)

圖3

ggscatterstats函數(shù)

此函數(shù)使用ggExtra :: ggMarginal中的邊緣直方圖/箱線圖/密度/小提琴/ densigram圖創(chuàng)建散點(diǎn)圖，并在副標(biāo)題中顯示統(tǒng)計(jì)分析結(jié)果：

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)
ggstatsplot::ggscatterstats(
  data = ggplot2::msleep,
  x = sleep_rem,
  y = awake,
  xlab = "REM sleep (in hours)",
  ylab = "Amount of time spent awake (in hours)",
  title = "Understanding mammalian sleep",
  messages = FALSE
)

圖4

圖4表達(dá)的是sleep_rem與awake存在相關(guān)性踪宠，其中X軸為sleep_rem自赔，Y軸為awake。該圖中右側(cè)和上方的直方圖代表的是數(shù)據(jù)的分布柳琢。該段數(shù)據(jù)越多绍妨，其柱子越高。

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)

# plot
ggstatsplot::ggscatterstats(
  data = dplyr::filter(.data = ggstatsplot::movies_long, genre == "Action"),
  x = budget,
  y = rating,
  type = "robust", # type of test that needs to be run
  conf.level = 0.99, # confidence level
  xlab = "Movie budget (in million/ US$)", # label for x axis
  ylab = "IMDB rating", # label for y axis
  label.var = "title", # variable for labeling data points
  label.expression = "rating < 5 & budget > 100", # expression that decides which points to label
  line.color = "yellow", # changing regression line color line
  title = "Movie budget and IMDB rating (action)", # title text for the plot
  caption = expression( # caption text for the plot
    paste(italic("Note"), ": IMDB stands for Internet Movie DataBase")
  ),
  ggtheme = theme_bw(), # choosing a different theme
  ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer
  marginal.type = "density", # type of marginal distribution to be displayed
  xfill = "#0072B2", # color fill for x-axis marginal distribution
  yfill = "#009E73", # color fill for y-axis marginal distribution
  xalpha = 0.6, # transparency for x-axis marginal distribution
  yalpha = 0.6, # transparency for y-axis marginal distribution
  centrality.para = "median", # central tendency lines to be displayed
  messages = FALSE # turn off messages and notes
)

圖5

ggbarstats柱狀圖

ggbarstats函數(shù)主要用于展示不同組之間分類數(shù)據(jù)的分布問題柬脸。例如：A組患者中他去，男女的比例是否與B組患者中男女的比例存在異同。

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
library(hrbrthemes)
set.seed(123)
# plot
ggstatsplot::ggbarstats(
  data = ggstatsplot::movies_long,
  main = mpaa,
  condition = genre,
  sampling.plan = "jointMulti",
  title = "MPAA Ratings by Genre",
  xlab = "movie genre",
  perc.k = 1,
  x.axis.orientation = "slant",
  ggtheme = hrbrthemes::import_roboto_condensed(),
  ggstatsplot.layer = FALSE,
  ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text(face = "italic")),
  palette = "Set2",
  messages = FALSE
)

圖6

圖6倒堕，堆積柱狀圖：比較的是不同組之間灾测，分類數(shù)據(jù)的分布是否存在異同。同樣可以修改參數(shù)讓它顯得更加復(fù)雜和美觀垦巴。
ggtheme = hrbrthemes::import_roboto_condensed()原始的參考文件不是這的而是ggtheme = hrbrthemes::theme_modern_rc()所以需要先加載hrbrthemes包媳搪，這個(gè)過程中容易出現(xiàn)報(bào)錯(cuò)

Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  : 
  polygon edge not found
In addition: Warning message:
In grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :
  no font could be found for family "Roboto Condensed"

gghistostats

看一個(gè)變量的分布并通過一個(gè)樣本測試檢查它是否與指定值明顯有差異：

ggstatsplot::gghistostats(
  data = ToothGrowth, # dataframe from which variable is to be taken
  x = len, # numeric variable whose distribution is of interest
  title = "Distribution of Sepal.Length", # title for the plot
  fill.gradient = TRUE, # use color gradient
  test.value = 10, # the comparison value for t-test
  test.value.line = TRUE, # display a vertical line at test value
  type = "bf", # bayes factor for one sample t-test
  bf.prior = 0.8, # prior width for calculating the bayes factor
  messages = FALSE # turn off the messages
)

圖7

ggdotplotstats

此函數(shù)類似于gghistostats，當(dāng)變量有數(shù)字標(biāo)簽是使用更佳骤宣。

set.seed(123)

# plot
ggdotplotstats(
  data = dplyr::filter(.data = gapminder::gapminder, continent == "Asia"),
  y = country,
  x = lifeExp,
  test.value = 55,
  test.value.line = TRUE,
  test.line.labeller = TRUE,
  test.value.color = "red",
  centrality.para = "median",
  centrality.k = 0,
  title = "Distribution of life expectancy in Asian continent",
  xlab = "Life expectancy",
  messages = FALSE,
  caption = substitute(
    paste(
      italic("Source"),
      ": Gapminder dataset from https://www.gapminder.org/"
    )
  )
)

圖8

ggcorrmat

該函數(shù)主要用于變量之間的相關(guān)性分析：

set.seed(123)
# as a default this function outputs a correlalogram plot
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  corr.method = "robust", # correlation method
  sig.level = 0.001, # threshold of significance
  p.adjust.method = "holm", # p-value adjustment method for multiple comparisons
  cor.vars = c(sleep_rem, awake:bodywt), # a range of variables can be selected
  cor.vars.names = c(
    "REM sleep", # variable names
    "time awake",
    "brain weight",
    "body weight"
  ),
  matrix.type = "upper", # type of visualization matrix
  colors = c("#B2182B", "white", "#4D4D4D"),
  title = "Correlalogram for mammals sleep dataset",
  subtitle = "sleep units: hours; weight units: kilograms"
)

圖9

ggcoefstats

回歸分析森林圖展示點(diǎn)估計(jì)值帶有置信區(qū)間的點(diǎn)：

set.seed(123)

# model
mod <- stats::lm(
  formula = mpg ~ am * cyl,
  data = mtcars
)

# plot
ggstatsplot::ggcoefstats(x = mod)

圖10

除了以上的用內(nèi)置數(shù)據(jù)完成的幾類繪圖秦爆，這個(gè)包還支持用其他包繪圖，同時(shí)用ggstatsplot包展示統(tǒng)計(jì)分析結(jié)果：

set.seed(123)

# loading the needed libraries
#install.packages("yarrr")
library(yarrr)
library(ggstatsplot)

# using `ggstatsplot` to get call with statistical results
stats_results <-
  ggstatsplot::ggbetweenstats(
    data = ChickWeight,
    x = Time,
    y = weight,
    return = "subtitle",
    messages = FALSE
  )
# using `yarrr` to create plot
yarrr::pirateplot(
  formula = weight ~ Time,
  data = ChickWeight,
  theme = 1,
  main = stats_results
)

圖11

參考學(xué)習(xí)資料：
https://cloud.tencent.com/developer/article/1450100
https://github.com/IndrajeetPatil/ggstatsplot

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

人面猴
序言：七十年代末憔披，一起剝皮案震驚了整個(gè)濱河市等限，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌，老刑警劉巖精刷，帶你破解...
沈念sama閱讀 216,997評(píng)論 6贊 502
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件拗胜，死亡現(xiàn)場離奇詭異，居然都是意外死亡怒允，警方通過查閱死者的電腦和手機(jī)埂软，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,603評(píng)論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來纫事，“玉大人勘畔，你說我怎么就攤上這事±龌蹋” “怎么了炫七？”我有些...
開封第一講書人閱讀 163,359評(píng)論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長钾唬。經(jīng)常有香客問我万哪，道長，這世上最難降的妖魔是什么抡秆？我笑而不...
開封第一講書人閱讀 58,309評(píng)論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任奕巍，我火速辦了婚禮，結(jié)果婚禮上儒士，老公的妹妹穿的比我還像新娘的止。我一直安慰自己，他們只是感情好着撩，可當(dāng)我...
茶點(diǎn)故事閱讀 67,346評(píng)論 6贊 390
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布诅福。她就那樣靜靜地躺著，像睡著了一般拖叙。火紅的嫁衣襯著肌膚如雪氓润。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,258評(píng)論 1贊 300
城市分裂傳說
那天憋沿，我揣著相機(jī)與錄音旺芽，去河邊找鬼。笑死辐啄，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的运嗜。我是一名探鬼主播壶辜，決...
沈念sama閱讀 40,122評(píng)論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢(mèng)啊……” “哼担租！你這毒婦竟也來了砸民？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 38,970評(píng)論 0贊 275
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤，失蹤者是張志新（化名）和其女友劉穎岭参，沒想到半個(gè)月后反惕，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,403評(píng)論 1贊 313
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡演侯，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 37,596評(píng)論 3贊 334
?白月光啟示錄
正文我和宋清朗相戀三年姿染，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片秒际。...
茶點(diǎn)故事閱讀 39,769評(píng)論 1贊 348
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡悬赏，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出娄徊，到底是詐尸還是另有隱情闽颇，我是刑警寧澤，帶...
沈念sama閱讀 35,464評(píng)論 5贊 344
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布寄锐，位于F島的核電站兵多，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏橄仆。R本人自食惡果不足惜中鼠，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 41,075評(píng)論 3贊 327
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望沿癞。院中可真熱鬧援雇，春花似錦、人聲如沸椎扬。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,705評(píng)論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽蚕涤。三九已至筐赔，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間揖铜，已是汗流浹背茴丰。一陣腳步聲響...
開封第一講書人閱讀 32,848評(píng)論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留天吓，地道東北人贿肩。一個(gè)月前我還...
沈念sama閱讀 47,831評(píng)論 2贊 370
代替公主和親
正文我出身青樓，卻偏偏與公主長得像龄寞，于是被迫代替她去往敵國和親汰规。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 44,678評(píng)論 2贊 354

可視化神器ggstatsplot = 繪圖+統(tǒng)計(jì)

ggbetweenstats函數(shù)

ggbetweenstats函數(shù)

ggscatterstats函數(shù)

ggbarstats柱狀圖

gghistostats

ggdotplotstats

ggcorrmat

ggcoefstats

推薦閱讀更多精彩內(nèi)容