簡介
柱狀圖一般用于变秦,當我們都有一組分類變量以及每個類別的定量值壳嚎,而我們關(guān)注的主要重點是定量值的大小時障癌。
應該在柱狀圖背景保留橫網(wǎng)格線凌外,便于比較我們關(guān)注的值。
當分類label過長時涛浙,最好選擇橫向柱狀圖康辑,避免出現(xiàn)旋轉(zhuǎn)label,保持文字閱讀方向與圖形方向的統(tǒng)一性轿亮。
應該注意對柱狀圖進行排序(大小疮薇,分類變量,分布心態(tài))我注。
當分類數(shù)據(jù)過多時按咒,可以選擇棒棒糖圖(點圖 + 點到坐標軸連線)或熱圖
ggplot2中柱狀圖的基本繪制函數(shù)有g(shù)eom_bar() 和 geom_col(),其中g(shù)eom_bar() 產(chǎn)生的柱狀圖映射是經(jīng)過統(tǒng)計變換的(count, ..prop..)但骨;geom_col()是不經(jīng)過統(tǒng)計變換的励七,代表的就是該分類變量的實際值。
1. 美學映射
x
y
alpha
colour
fill
group
linetype
size
ggplot2中柱狀圖的基本繪制函數(shù)有g(shù)eom_bar() 和 geom_col()奔缠,其中g(shù)eom_bar() 產(chǎn)生的柱狀圖映射是經(jīng)過統(tǒng)計變換的(count, ..prop..)掠抬;geom_col()是不經(jīng)過統(tǒng)計變換的,代表的就是該分類變量的實際值校哎。
2. 簡單柱狀圖
ggplot() + geom_bar(data = mpg, aes(x = class), stat = "count")
以每個x在數(shù)據(jù)集中出現(xiàn)的總數(shù)為y軸两波。
排序
ggplot2中一般數(shù)據(jù)和視覺元素映射是分開的,如果需要對柱狀圖排序闷哆,就需要對數(shù)據(jù)進行排序處理腰奋。 數(shù)據(jù)排序 柱狀圖排序
data_sorted <- mpg %>%
group_by(class) %>%
summarise(count = n()) %>%
mutate(class = fct_reorder(class, count))
?
ggplot() + geom_bar(data =data_sorted, aes(x = class, y = count),
stat = "identity")
外框顏色和填充顏色
ggplot() + geom_bar(data = mpg, aes(x = class), stat = "count",
fill = "white", colour="dodgerblue")
參數(shù)color控制外框顏色,fill控制填充顏色抱怔。 坐標軸中斷
當柱狀圖非常高氛堕,展示時可以選擇截斷坐標軸,形成只有底部和上部的中斷柱狀圖野蝇。
水平柱狀圖
當數(shù)據(jù)分組標簽名字過長時讼稚,有一種方法是將label旋轉(zhuǎn)括儒,這樣它們就不會互相重疊。
film <- data.frame(
RANK =c(1,2,3,4,5),
Title = c("Star Wars: The Last Jedi",
"Jumanji_Welcome to the Jungle",
"Pitch Perfect",
"The Greatest Showman",
"Ferdinand"),
Weekend_gross = c(71565498, 36169328, 19928525, 805843, 7316746))
p <- ggplot() + geom_bar(data = data_sorted,
aes(x = Title2, y = Weekend_gross2),
stat = "identity",
width = 0.5,
position = position_dodge(width = 0.9))
p + theme(axis.text.x = element_text(angle = 45,
vjust = 1,
hjust = 1,
size = 10),
axis.text.y = element_text(size = 10))
但一般過長的名字锐想,旋轉(zhuǎn)后可讀性也并不好帮寻,而且還破壞了整個圖片中字符的水平排列一致性,所以比較好的解決辦法是將坐標軸旋轉(zhuǎn)( coord_flip())赠摇,生成水平柱狀圖固逗。
library(tidyverse)
data_sorted <- film %>%
mutate(Title2 = fct_reorder(Title, Weekend_gross),
Weekend_gross2 = Weekend_gross/1000000)
ggplot() + geom_bar(data = data_sorted,
aes(x = Title2, y = Weekend_gross2),
stat = "identity",
width = 0.5,
position = position_dodge(width = 0.9)) +
theme(panel.grid.major.x = element_line(colour = "black"),
panel.background = element_blank(),
axis.line.y = element_blank(),
axis.title.y = element_blank()) +
coord_flip()
誤差線
- 標準差(sd)是描述性統(tǒng)計里用來表示數(shù)據(jù)本身均值范圍的,兩倍標準差范圍以外就可能是異常值了藕帜,標準差的使用不牽扯均值對比推測烫罩,僅僅是描述性的。 2. 標準誤(se)表示樣本平均數(shù)對總體平均數(shù)的變異程度洽故,反映抽樣誤差的大小贝攒,是量度結(jié)果精密度的指標。
注: 95%置信區(qū)間是用的Mean ± 2*SE时甚,即ci隘弊。
3. 堆疊柱狀圖
ggplot() + geom_bar(data = mpg, aes(x = class, fill = drv), stat = "count")
分組作圖的默認position 是 position = "stack",fill參數(shù)表示將數(shù)據(jù)映射為填充顏色荒适,color參數(shù)表示將數(shù)據(jù)映射為外框顏色梨熙。 堆疊組塊間留空
利用lwd參數(shù)增加外框線寬度,然后將外框線顏色和背景色統(tǒng)一刀诬,就可以形成堆疊間有間隔的柱狀圖咽扇。
ggplot() + geom_bar(data = mpg, aes(x = class, group = drv, fill = drv),
stat = "count", lwd = 1.5, colour = "white") +
theme_classic()
百分比堆疊圖
- 比較各組中每個類別出現(xiàn)次數(shù)在該組中占的百分比
ggplot() + geom_bar(data = mpg, aes(x = class, fill = factor(cyl)),
position = "fill")
- 比較各組中每個類別實際值在該組中占的百分比
data <- mpg %>%
group_by(class, cyl) %>%
summarise(count = n())
ggplot() + geom_bar(data =data, aes(x = class, y = count, fill = factor(cyl)),
stat = "identity",
position = "fill")
tips:由于數(shù)據(jù)集data中的count就是數(shù)據(jù)集mpg中每個組別的出現(xiàn)次數(shù),因此值是圖片是一樣的陕壹。
堆疊柱狀圖連線
求出連線起點和終點坐標肌割,用segment或line
ggalluvial
4. 并排柱狀圖
position = "dodge"
ggplot() + geom_bar(data = mpg,aes(x = class, fill=factor(cyl)), position="dodge")
在aes()內(nèi)部的width控制柱子的寬度,position = position_dodge()中的width控制的是一組中各柱子的間隔寬度帐要。 position_dodge2()
2seater中的cyl在分組4,5和6上沒有值把敞,因此柱狀圖中2seater占據(jù)了x軸上4個位子的寬度≌セ荩可以用position_dodge2解決奋早, 其中的參數(shù)preserve = "single"使每個柱子寬度相同并居中,preserve = "total"結(jié)果與position_dodge相同赠橙。
ggplot() + geom_bar(data = mpg, aes(x = class, fill=factor(cyl)),
position = position_dodge2(padding = 0,
preserve = "single"))
position_dodge2中用參數(shù)padding控制同一分組中各柱子的間隔寬度耽装,默認為padding = 0.1。 并排柱狀圖的誤差線
并排的柱狀圖誤差線和單個的相同期揪,但需要注意一些參數(shù)掉奄。
- position_dodge() 用position_dodge() 產(chǎn)生的并排柱狀圖,首先凤薛,需要給誤差線一個分組依據(jù)姓建,然后進行的potion調(diào)試:
data <- mpg %>%
group_by(class, cyl) %>%
summarise(count = n())
ggplot() + geom_col(data = data, aes(x = class, y = count, fill=factor(cyl)),
position = position_dodge()) +
geom_errorbar(data = data, aes(x = class, ymin = count - 1, ymax = count + 1,
group = factor(cyl)),
width= 0.2,
position = position_dodge(0.9))
由于width是根據(jù)柱子的寬度產(chǎn)生的诞仓,所以其寬度是不同的,我覺得應該有參數(shù)可以固定寬度速兔,但是沒有找到墅拭。
- position_dodge2() 同樣的,position_dodge2()生成的并排柱狀圖涣狗,其error_bar()的position同樣需要調(diào)試谍婉。
data <- mpg %>%
group_by(class, cyl) %>%
summarise(count = n())
ggplot() + geom_col(data = data, aes(x = class, y = count, fill = factor(cyl)),
position = position_dodge2(padding = 0, preserve = "single")) +
geom_errorbar(data = data, aes(x = class,
ymin = count - 1,
ymax = count + 1),
position = position_dodge2(padding = 0.5, preserve = "single"))
5. 棒棒糖圖
a. 一個分組變量
data <- mpg %>%
group_by( manufacturer) %>%
summarise(count = n())
?
# 設置連線的起點Y坐標為0
data$ymin <- rep(0, 15)
?
ggplot(data = data) + geom_point(aes(x = manufacturer,
y = count),size = 5) +
geom_segment(aes(x = manufacturer, y = ymin,
xend = manufacturer, yend = count)) +
# 設置y軸從0開始
scale_y_continuous(expand = c(0,0))+
# 由于x軸名字有重疊,旋轉(zhuǎn)坐標軸變成橫向
coord_flip() +
theme(panel.background = element_blank(), # 去掉背景格子
# 顯示x平行網(wǎng)格線
panel.grid.major.x = element_line(colour = "black"),
# 顯示x軸坐標
axis.line.x = element_line(colour = "black"),
axis.title.y = element_blank())
[圖片上傳失敗...(image-ae516e-1582459796851)]
b. 分組中再分組
data <- mpg %>%
group_by( manufacturer, cyl) %>%
summarise(count = n())
data$ymin <- rep(0, times = 32)
# 將兩個分組信息合并生成新的分組镀钓,此時
data$group <- paste(data$manufacturer, data$cyl, sep="_")
?
theme <- theme(panel.background = element_blank(), # 去掉背景格子
# 顯示x平行網(wǎng)格線
panel.grid.major.x = element_line(colour = "black"),
# 顯示x軸坐標
axis.line.x = element_line(colour = "black"),
axis.title.y = element_blank())
?
?
ggplot(data = data) + geom_point(aes(x = group, y = count,
color = factor(cyl)),
size = 5) +
geom_segment(aes(x = group, y = ymin,
xend = group, yend = count)) +
scale_y_continuous(limits =c(0, 25) ,expand = c(0,0))+
coord_flip() + theme
但x軸的labels就變成了合并后的文字穗熬,解決辦法:
1.修改x軸信息
data <- mpg %>%
group_by( manufacturer, cyl) %>%
summarise(count = n())
data$ymin <- rep(0, times = 32)
as.integer()
data$index <- as.integer(rownames(data))
?
theme <- theme(panel.background = element_blank(), # 去掉背景格子
# 顯示x平行網(wǎng)格線
panel.grid.major.x = element_line(colour = "black"),
# 顯示x軸坐標
axis.line.x = element_line(colour = "black"),
axis.title.y = element_blank())
?
?
ggplot(data = data) + geom_point(aes(x = index, y = count,
color = factor(cyl)),
size = 5) +
geom_segment(aes(x = index, y = ymin,
xend = index, yend = count)) +
scale_y_continuous(limits =c(0, 25) ,expand = c(0,0)) +
# 修改坐標軸信息
scale_x_continuous(breaks = data$index,
labels = data$manufacturer) +
coord_flip() + theme
2.對分組變量添加label信息
data <- mpg %>%
group_by( manufacturer, cyl) %>%
summarise(count = n())
data$ymin <- rep(0, times = 32)
?
data$manufacturer <- factor(as.integer(rownames(data)),
labels = data$manufacturer)
?
theme <- theme(panel.background = element_blank(), # 去掉背景格子
# 顯示x平行網(wǎng)格線
panel.grid.major.x = element_line(colour = "black"),
# 顯示x軸坐標
axis.line.x = element_line(colour = "black"),
axis.title.y = element_blank())
?
?
ggplot(data = data) + geom_point(aes(x = manufacturer, y = count,
color = factor(cyl)),
size = 5) +
geom_segment(aes(x = manufacturer, y = ymin,
xend = manufacturer, yend = count)) +
scale_y_continuous(limits =c(0, 25) ,expand = c(0,0)) +
coord_flip() + theme
3.structure() - Attributes信息
所有對象都可以具有任意其他屬性。可以將它們視為該列表和命名列表構(gòu)成的數(shù)據(jù)框。 可以使用attr()單獨訪問屬性,也可以使用attribute()一次訪問所有屬性列表。而structure()函數(shù)是R中給對象賦予Attributes的函數(shù)片部。
names, character vector of element names
labels
class, used to implement the S3 object system, described in the next section
dim, used to turn vectors into high-dimensional structures
data <- mpg %>%
group_by( manufacturer, cyl) %>%
summarise(count = n())
data$ymin <- rep(0, times = 32)
?
data$index <- fct_inseq(rownames(data))
# 添加Attributes - 也是labels
labels <- structure(data$manufacturer,
labels = data$index)
theme <- theme(panel.background = element_blank(), # 去掉背景格子
# 顯示x平行網(wǎng)格線
panel.grid.major.x = element_line(colour = "black"),
# 顯示x軸坐標
axis.line.x = element_line(colour = "black"),
axis.title.y = element_blank())
?
?
ggplot(data = data) + geom_point(aes(x = index, y = count,
color = factor(cyl)),
size = 5) +
geom_segment(aes(x = index, y = ymin,
xend = index, yend = count)) +
scale_y_continuous(limits =c(0, 25) ,expand = c(0,0)) +
scale_x_discrete(labels = labels) +
coord_flip() + theme
6. 金字塔圖
金字塔圖的核心就是找到需要分開的變量,然后以它為依據(jù)對數(shù)據(jù)進行正和負變換似嗤,然后將正負坐標軸強制設置成對應的正值川背。
set.seed(13)
data <- data.frame(num = rep(c(seq(from = 10, to = 100, by = 10),
rev(seq(from = 40, to = 90, by = 10))),
times = 2),
age = rep(seq(from = 80, to = 5, by = -5),
times = 2),
gender = rep(c("male", "female"), each = 16))
?
# 根據(jù)性別對num賦值
data$num2 <- ifelse(data$gender == "male",
data$num *1,
data$num * -1)
?
ggplot(data = data) + geom_col(aes(x = factor(age),
y = num2,
fill = gender)) +
scale_y_continuous(breaks = seq(from = -100, to = 100,
by = 20),
labels = c(seq(100, 0, -20),
seq(20, 100, 20))) +
coord_flip() + theme_bw()