可視化神器ggstatsplot = 繪圖+統(tǒng)計(jì)

ggstatsplotggplot2包的擴(kuò)展鸽凶,主要用于繪制可發(fā)表的圖片同時(shí)標(biāo)注統(tǒng)計(jì)學(xué)分析結(jié)果仿野,其統(tǒng)計(jì)學(xué)分析結(jié)果包含統(tǒng)計(jì)分析的詳細(xì)信息线罕,該包對(duì)于經(jīng)常需要做統(tǒng)計(jì)分析的科研工作者來說非常有用屉佳。
ggstatsplot在統(tǒng)計(jì)學(xué)分析方面的優(yōu)勢(shì):

  • 目前它支持最常見的統(tǒng)計(jì)測試類型:t-test / anova宛蚓,非參數(shù),相關(guān)性分析惑惶,列聯(lián)表分析和回歸分析煮盼。
  • 在圖片輸出方面也表現(xiàn)出色:
    (1)小提琴圖(用于不同組之間連續(xù)數(shù)據(jù)的異同分析);
    (2)餅圖(用于分類數(shù)據(jù)的分布檢驗(yàn))带污;
    (3)條形圖(用于分類數(shù)據(jù)的分布檢驗(yàn))僵控;
    (4)散點(diǎn)圖(用于兩個(gè)變量之間的相關(guān)性分析);
    (5)相關(guān)矩陣(用于多個(gè)變量之間的相關(guān)性分析)鱼冀;
    (6)直方圖和點(diǎn)圖/圖表(關(guān)于分布的假設(shè)檢驗(yàn))报破;
    (7)點(diǎn)須圖(用于回歸模型)。

以下是一些實(shí)用的例子:

ggbetweenstats函數(shù)

可創(chuàng)建小提琴圖千绪,箱線圖或兩者的混合泛烙,主要用于組間或不同條件之間的連續(xù)數(shù)據(jù)的比較, 最簡單的函數(shù)調(diào)用如下所示:

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)

ggstatsplot::ggbetweenstats(
  data = iris,
  x = Species,
  y = Sepal.Length,
  messages = FALSE
) + # further modification outside of ggstatsplot
  ggplot2::coord_cartesian(ylim = c(3, 8)) +
  ggplot2::scale_y_continuous(breaks = seq(3, 8, by = 1))

結(jié)果如下圖所示:

圖1

如果在加載包的時(shí)候不同時(shí)加載ggplot2
便會(huì)出現(xiàn)如下報(bào)錯(cuò):

Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  : 
  polygon edge not found

從圖1我們可以看出不同種類的iris在 Sepal.Length上有顯著差異翘紊。但是其實(shí)我們可以修改參數(shù)蔽氨,讓其看起來更加富有信息。

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)
# 去掉一列帆疟,舍棄anova檢測看是否有t-test的結(jié)果
iris2 <- dplyr::filter(.data = iris, Species != "setosa")

iris2$Species <-
  base::factor(
    x = iris2$Species,
    levels = c("virginica", "versicolor")
  )
# plot
ggstatsplot::ggbetweenstats(
  data = iris2,
  x = Species,
  y = Sepal.Length,
  notch = TRUE, # show notched box plot
  mean.plotting = TRUE, # whether mean for each group is to be displayed
  mean.ci = TRUE, # whether to display confidence interval for means
  mean.label.size = 2.5, # size of the label for mean
  type = "p", # which type of test is to be run
  k = 3, # number of decimal places for statistical results
  outlier.tagging = TRUE, # whether outliers need to be tagged
  outlier.label = Sepal.Width, # variable to be used for the outlier tag
  outlier.label.color = "darkgreen", # changing the color for the text label
  xlab = "Type of Species", # label for the x-axis variable
  ylab = "Attribute: Sepal Length", # label for the y-axis variable
  title = "Dataset: Iris flower data set", # title text for the plot
  ggtheme = ggthemes::theme_fivethirtyeight(), # choosing a different theme
  ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer
  package = "wesanderson", # package from which color palette is to be taken
  palette = "Darjeeling1", # choosing a different color palette
  messages = FALSE
)
圖2

ggbetweenstats函數(shù)

ggbetweenstats函數(shù)的功能幾乎與ggwithinstats相同鹉究。

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)

ggstatsplot::ggwithinstats(
  data = iris,
  x = Species,
  y = Sepal.Length,
  messages = FALSE
)
圖3
# plot
ggstatsplot::ggwithinstats(
  data = iris,
  x = Species,
  y = Sepal.Length,
  sort = "descending", # ordering groups along the x-axis based on
  sort.fun = median, # values of `y` variable
  pairwise.comparisons = TRUE,
  pairwise.display = "s",
  pairwise.annotation = "p",
  title = "iris",
  caption = "Data from: iris",
  ggtheme = ggthemes::theme_fivethirtyeight(),
  ggstatsplot.layer = FALSE,
  messages = FALSE
)
圖3

ggscatterstats函數(shù)

此函數(shù)使用ggExtra :: ggMarginal中的邊緣直方圖/箱線圖/密度/小提琴/ densigram圖創(chuàng)建散點(diǎn)圖,并在副標(biāo)題中顯示統(tǒng)計(jì)分析結(jié)果:

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)
ggstatsplot::ggscatterstats(
  data = ggplot2::msleep,
  x = sleep_rem,
  y = awake,
  xlab = "REM sleep (in hours)",
  ylab = "Amount of time spent awake (in hours)",
  title = "Understanding mammalian sleep",
  messages = FALSE
)

圖4

圖4表達(dá)的是sleep_remawake存在相關(guān)性踪宠,其中X軸為sleep_rem自赔,Y軸為awake。該圖中右側(cè)和上方的直方圖代表的是數(shù)據(jù)的分布柳琢。該段數(shù)據(jù)越多绍妨,其柱子越高。

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
set.seed(123)

# plot
ggstatsplot::ggscatterstats(
  data = dplyr::filter(.data = ggstatsplot::movies_long, genre == "Action"),
  x = budget,
  y = rating,
  type = "robust", # type of test that needs to be run
  conf.level = 0.99, # confidence level
  xlab = "Movie budget (in million/ US$)", # label for x axis
  ylab = "IMDB rating", # label for y axis
  label.var = "title", # variable for labeling data points
  label.expression = "rating < 5 & budget > 100", # expression that decides which points to label
  line.color = "yellow", # changing regression line color line
  title = "Movie budget and IMDB rating (action)", # title text for the plot
  caption = expression( # caption text for the plot
    paste(italic("Note"), ": IMDB stands for Internet Movie DataBase")
  ),
  ggtheme = theme_bw(), # choosing a different theme
  ggstatsplot.layer = FALSE, # turn off ggstatsplot theme layer
  marginal.type = "density", # type of marginal distribution to be displayed
  xfill = "#0072B2", # color fill for x-axis marginal distribution
  yfill = "#009E73", # color fill for y-axis marginal distribution
  xalpha = 0.6, # transparency for x-axis marginal distribution
  yalpha = 0.6, # transparency for y-axis marginal distribution
  centrality.para = "median", # central tendency lines to be displayed
  messages = FALSE # turn off messages and notes
)
圖5

ggbarstats柱狀圖

ggbarstats函數(shù)主要用于展示不同組之間分類數(shù)據(jù)的分布問題柬脸。例如:A組患者中他去,男女的比例是否與B組患者中男女的比例存在異同。

rm(list = ls())
options(stringsAsFactors = F)
library(ggstatsplot)
library(ggplot2)
library(hrbrthemes)
set.seed(123)
# plot
ggstatsplot::ggbarstats(
  data = ggstatsplot::movies_long,
  main = mpaa,
  condition = genre,
  sampling.plan = "jointMulti",
  title = "MPAA Ratings by Genre",
  xlab = "movie genre",
  perc.k = 1,
  x.axis.orientation = "slant",
  ggtheme = hrbrthemes::import_roboto_condensed(),
  ggstatsplot.layer = FALSE,
  ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text(face = "italic")),
  palette = "Set2",
  messages = FALSE
)

圖6

圖6倒堕,堆積柱狀圖:比較的是不同組之間灾测,分類數(shù)據(jù)的分布是否存在異同。同樣可以修改參數(shù)讓它顯得更加復(fù)雜和美觀垦巴。
ggtheme = hrbrthemes::import_roboto_condensed()原始的參考文件不是這的而是ggtheme = hrbrthemes::theme_modern_rc()所以需要先加載hrbrthemes包媳搪,這個(gè)過程中容易出現(xiàn)報(bào)錯(cuò)

Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  : 
  polygon edge not found
In addition: Warning message:
In grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :
  no font could be found for family "Roboto Condensed"

gghistostats

看一個(gè)變量的分布并通過一個(gè)樣本測試檢查它是否與指定值明顯有差異:

ggstatsplot::gghistostats(
  data = ToothGrowth, # dataframe from which variable is to be taken
  x = len, # numeric variable whose distribution is of interest
  title = "Distribution of Sepal.Length", # title for the plot
  fill.gradient = TRUE, # use color gradient
  test.value = 10, # the comparison value for t-test
  test.value.line = TRUE, # display a vertical line at test value
  type = "bf", # bayes factor for one sample t-test
  bf.prior = 0.8, # prior width for calculating the bayes factor
  messages = FALSE # turn off the messages
)
圖7

ggdotplotstats

此函數(shù)類似于gghistostats,當(dāng)變量有數(shù)字標(biāo)簽是使用更佳骤宣。

set.seed(123)

# plot
ggdotplotstats(
  data = dplyr::filter(.data = gapminder::gapminder, continent == "Asia"),
  y = country,
  x = lifeExp,
  test.value = 55,
  test.value.line = TRUE,
  test.line.labeller = TRUE,
  test.value.color = "red",
  centrality.para = "median",
  centrality.k = 0,
  title = "Distribution of life expectancy in Asian continent",
  xlab = "Life expectancy",
  messages = FALSE,
  caption = substitute(
    paste(
      italic("Source"),
      ": Gapminder dataset from https://www.gapminder.org/"
    )
  )
)
圖8

ggcorrmat

該函數(shù)主要用于變量之間的相關(guān)性分析:

set.seed(123)
# as a default this function outputs a correlalogram plot
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  corr.method = "robust", # correlation method
  sig.level = 0.001, # threshold of significance
  p.adjust.method = "holm", # p-value adjustment method for multiple comparisons
  cor.vars = c(sleep_rem, awake:bodywt), # a range of variables can be selected
  cor.vars.names = c(
    "REM sleep", # variable names
    "time awake",
    "brain weight",
    "body weight"
  ),
  matrix.type = "upper", # type of visualization matrix
  colors = c("#B2182B", "white", "#4D4D4D"),
  title = "Correlalogram for mammals sleep dataset",
  subtitle = "sleep units: hours; weight units: kilograms"
)
圖9

ggcoefstats

回歸分析森林圖展示點(diǎn)估計(jì)值帶有置信區(qū)間的點(diǎn):

set.seed(123)

# model
mod <- stats::lm(
  formula = mpg ~ am * cyl,
  data = mtcars
)

# plot
ggstatsplot::ggcoefstats(x = mod)
圖10

除了以上的用內(nèi)置數(shù)據(jù)完成的幾類繪圖秦爆,這個(gè)包還支持用其他包繪圖,同時(shí)用ggstatsplot包展示統(tǒng)計(jì)分析結(jié)果:

set.seed(123)

# loading the needed libraries
#install.packages("yarrr")
library(yarrr)
library(ggstatsplot)

# using `ggstatsplot` to get call with statistical results
stats_results <-
  ggstatsplot::ggbetweenstats(
    data = ChickWeight,
    x = Time,
    y = weight,
    return = "subtitle",
    messages = FALSE
  )
# using `yarrr` to create plot
yarrr::pirateplot(
  formula = weight ~ Time,
  data = ChickWeight,
  theme = 1,
  main = stats_results
)

圖11

參考學(xué)習(xí)資料:
https://cloud.tencent.com/developer/article/1450100
https://github.com/IndrajeetPatil/ggstatsplot

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末憔披,一起剝皮案震驚了整個(gè)濱河市等限,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖精刷,帶你破解...
    沈念sama閱讀 216,997評(píng)論 6 502
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件拗胜,死亡現(xiàn)場離奇詭異,居然都是意外死亡怒允,警方通過查閱死者的電腦和手機(jī)埂软,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,603評(píng)論 3 392
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來纫事,“玉大人勘畔,你說我怎么就攤上這事±龌蹋” “怎么了炫七?”我有些...
    開封第一講書人閱讀 163,359評(píng)論 0 353
  • 文/不壞的土叔 我叫張陵,是天一觀的道長钾唬。 經(jīng)常有香客問我万哪,道長,這世上最難降的妖魔是什么抡秆? 我笑而不...
    開封第一講書人閱讀 58,309評(píng)論 1 292
  • 正文 為了忘掉前任奕巍,我火速辦了婚禮,結(jié)果婚禮上儒士,老公的妹妹穿的比我還像新娘的止。我一直安慰自己,他們只是感情好着撩,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,346評(píng)論 6 390
  • 文/花漫 我一把揭開白布诅福。 她就那樣靜靜地躺著,像睡著了一般拖叙。 火紅的嫁衣襯著肌膚如雪氓润。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,258評(píng)論 1 300
  • 那天憋沿,我揣著相機(jī)與錄音旺芽,去河邊找鬼。 笑死辐啄,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的运嗜。 我是一名探鬼主播壶辜,決...
    沈念sama閱讀 40,122評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢(mèng)啊……” “哼担租!你這毒婦竟也來了砸民?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 38,970評(píng)論 0 275
  • 序言:老撾萬榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎岭参,沒想到半個(gè)月后反惕,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,403評(píng)論 1 313
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡演侯,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,596評(píng)論 3 334
  • 正文 我和宋清朗相戀三年姿染,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片秒际。...
    茶點(diǎn)故事閱讀 39,769評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡悬赏,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出娄徊,到底是詐尸還是另有隱情闽颇,我是刑警寧澤,帶...
    沈念sama閱讀 35,464評(píng)論 5 344
  • 正文 年R本政府宣布寄锐,位于F島的核電站兵多,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏橄仆。R本人自食惡果不足惜中鼠,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,075評(píng)論 3 327
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望沿癞。 院中可真熱鬧援雇,春花似錦、人聲如沸椎扬。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,705評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽蚕涤。三九已至筐赔,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間揖铜,已是汗流浹背茴丰。 一陣腳步聲響...
    開封第一講書人閱讀 32,848評(píng)論 1 269
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留天吓,地道東北人贿肩。 一個(gè)月前我還...
    沈念sama閱讀 47,831評(píng)論 2 370
  • 正文 我出身青樓,卻偏偏與公主長得像龄寞,于是被迫代替她去往敵國和親汰规。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,678評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容