tidyverse
是一組處理與可視化R包的集合,其中ggplot2
與dplyr
最廣為人知蚂斤。
核心包有以下一些:
- ggplot2 - 可視化數(shù)據(jù)
- dplyr - 數(shù)據(jù)操作語法殖熟,可以用它解決大部分?jǐn)?shù)據(jù)處理問題
- tidyr - 清理數(shù)據(jù)
- readr - 讀入表格數(shù)據(jù)
- purrr - 提供一個完整一致的工具集增強(qiáng)R的函數(shù)編程
- tibble - 新一代數(shù)據(jù)框
- stringr - 提供函數(shù)集用來處理字符數(shù)據(jù)
- forcats - 提供有用工具用來處理因子問題
有幾個包沒接觸過牵舵,R包太多了悍引,這些強(qiáng)力包還是有必要接觸和學(xué)習(xí)下使用妒蛇,碰到問題事半功倍机断。
安裝tidyverse
:
install.packages("tidyverse")
導(dǎo)入:
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
## √ ggplot2 2.2.1 √ purrr 0.2.4
## √ tibble 1.4.2 √ dplyr 0.7.4
## √ tidyr 0.8.0 √ stringr 1.3.0
## √ readr 1.1.1 √ forcats 0.3.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
有用的函數(shù)
# tidyverse與其他包的沖突
tidyverse_conflicts()
# 列出所有tidyverse的依賴包
tidyverse_deps()
#獲取tidyverse的logo
tidyverse_logo()
# 列出所有tidyverse包
tidyverse_packages()
# 更新tidyverse包
tidyverse_update()
載入數(shù)據(jù)
library(datasets)
#install.packages("gapminder")
library(gapminder)
attach(iris)
dplyr
過濾
filter()
函數(shù)可以用來取數(shù)據(jù)子集。
iris %>%
filter(Species == "virginica") # 指定滿足的行
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 6.3 3.3 6.0 2.5 virginica
## 2 5.8 2.7 5.1 1.9 virginica
## 3 7.1 3.0 5.9 2.1 virginica
## 4 6.3 2.9 5.6 1.8 virginica
## 5 6.5 3.0 5.8 2.2 virginica
## 6 7.6 3.0 6.6 2.1 virginica
## 7 4.9 2.5 4.5 1.7 virginica
## 8 7.3 2.9 6.3 1.8 virginica
## 9 6.7 2.5 5.8 1.8 virginica
## 10 7.2 3.6 6.1 2.5 virginica
## 11 6.5 3.2 5.1 2.0 virginica
## 12 6.4 2.7 5.3 1.9 virginica
## 13 6.8 3.0 5.5 2.1 virginica
## 14 5.7 2.5 5.0 2.0 virginica
## 15 5.8 2.8 5.1 2.4 virginica
## 16 6.4 3.2 5.3 2.3 virginica
## 17 6.5 3.0 5.5 1.8 virginica
## 18 7.7 3.8 6.7 2.2 virginica
## 19 7.7 2.6 6.9 2.3 virginica
## 20 6.0 2.2 5.0 1.5 virginica
## 21 6.9 3.2 5.7 2.3 virginica
## 22 5.6 2.8 4.9 2.0 virginica
## 23 7.7 2.8 6.7 2.0 virginica
## 24 6.3 2.7 4.9 1.8 virginica
## 25 6.7 3.3 5.7 2.1 virginica
## 26 7.2 3.2 6.0 1.8 virginica
## 27 6.2 2.8 4.8 1.8 virginica
## 28 6.1 3.0 4.9 1.8 virginica
## 29 6.4 2.8 5.6 2.1 virginica
## 30 7.2 3.0 5.8 1.6 virginica
## 31 7.4 2.8 6.1 1.9 virginica
## 32 7.9 3.8 6.4 2.0 virginica
## 33 6.4 2.8 5.6 2.2 virginica
## 34 6.3 2.8 5.1 1.5 virginica
## 35 6.1 2.6 5.6 1.4 virginica
## 36 7.7 3.0 6.1 2.3 virginica
## 37 6.3 3.4 5.6 2.4 virginica
## 38 6.4 3.1 5.5 1.8 virginica
## 39 6.0 3.0 4.8 1.8 virginica
## 40 6.9 3.1 5.4 2.1 virginica
## [到達(dá)getOption("max.print") -- 略過10行]]
iris %>%
filter(Species == "virginica", Sepal.Length > 6) # 多個條件用,分隔
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 6.3 3.3 6.0 2.5 virginica
## 2 7.1 3.0 5.9 2.1 virginica
## 3 6.3 2.9 5.6 1.8 virginica
## 4 6.5 3.0 5.8 2.2 virginica
## 5 7.6 3.0 6.6 2.1 virginica
## 6 7.3 2.9 6.3 1.8 virginica
## 7 6.7 2.5 5.8 1.8 virginica
## 8 7.2 3.6 6.1 2.5 virginica
## 9 6.5 3.2 5.1 2.0 virginica
## 10 6.4 2.7 5.3 1.9 virginica
## 11 6.8 3.0 5.5 2.1 virginica
## 12 6.4 3.2 5.3 2.3 virginica
## 13 6.5 3.0 5.5 1.8 virginica
## 14 7.7 3.8 6.7 2.2 virginica
## 15 7.7 2.6 6.9 2.3 virginica
## 16 6.9 3.2 5.7 2.3 virginica
## 17 7.7 2.8 6.7 2.0 virginica
## 18 6.3 2.7 4.9 1.8 virginica
## 19 6.7 3.3 5.7 2.1 virginica
## 20 7.2 3.2 6.0 1.8 virginica
## 21 6.2 2.8 4.8 1.8 virginica
## 22 6.1 3.0 4.9 1.8 virginica
## 23 6.4 2.8 5.6 2.1 virginica
## 24 7.2 3.0 5.8 1.6 virginica
## 25 7.4 2.8 6.1 1.9 virginica
## 26 7.9 3.8 6.4 2.0 virginica
## 27 6.4 2.8 5.6 2.2 virginica
## 28 6.3 2.8 5.1 1.5 virginica
## 29 6.1 2.6 5.6 1.4 virginica
## 30 7.7 3.0 6.1 2.3 virginica
## 31 6.3 3.4 5.6 2.4 virginica
## 32 6.4 3.1 5.5 1.8 virginica
## 33 6.9 3.1 5.4 2.1 virginica
## 34 6.7 3.1 5.6 2.4 virginica
## 35 6.9 3.1 5.1 2.3 virginica
## 36 6.8 3.2 5.9 2.3 virginica
## 37 6.7 3.3 5.7 2.5 virginica
## 38 6.7 3.0 5.2 2.3 virginica
## 39 6.3 2.5 5.0 1.9 virginica
## 40 6.5 3.0 5.2 2.0 virginica
## [到達(dá)getOption("max.print") -- 略過1行]]
排序
arrange()
函數(shù)用來對觀察值排序绣夺,默認(rèn)是升序吏奸。
iris %>%
arrange(Sepal.Length)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 4.3 3.0 1.1 0.1 setosa
## 2 4.4 2.9 1.4 0.2 setosa
## 3 4.4 3.0 1.3 0.2 setosa
## 4 4.4 3.2 1.3 0.2 setosa
## 5 4.5 2.3 1.3 0.3 setosa
## 6 4.6 3.1 1.5 0.2 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 4.6 3.6 1.0 0.2 setosa
## 9 4.6 3.2 1.4 0.2 setosa
## 10 4.7 3.2 1.3 0.2 setosa
## 11 4.7 3.2 1.6 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.8 3.4 1.9 0.2 setosa
## 15 4.8 3.1 1.6 0.2 setosa
## 16 4.8 3.0 1.4 0.3 setosa
## 17 4.9 3.0 1.4 0.2 setosa
## 18 4.9 3.1 1.5 0.1 setosa
## 19 4.9 3.1 1.5 0.2 setosa
## 20 4.9 3.6 1.4 0.1 setosa
## 21 4.9 2.4 3.3 1.0 versicolor
## 22 4.9 2.5 4.5 1.7 virginica
## 23 5.0 3.6 1.4 0.2 setosa
## 24 5.0 3.4 1.5 0.2 setosa
## 25 5.0 3.0 1.6 0.2 setosa
## 26 5.0 3.4 1.6 0.4 setosa
## 27 5.0 3.2 1.2 0.2 setosa
## 28 5.0 3.5 1.3 0.3 setosa
## 29 5.0 3.5 1.6 0.6 setosa
## 30 5.0 3.3 1.4 0.2 setosa
## 31 5.0 2.0 3.5 1.0 versicolor
## 32 5.0 2.3 3.3 1.0 versicolor
## 33 5.1 3.5 1.4 0.2 setosa
## 34 5.1 3.5 1.4 0.3 setosa
## 35 5.1 3.8 1.5 0.3 setosa
## 36 5.1 3.7 1.5 0.4 setosa
## 37 5.1 3.3 1.7 0.5 setosa
## 38 5.1 3.4 1.5 0.2 setosa
## 39 5.1 3.8 1.9 0.4 setosa
## 40 5.1 3.8 1.6 0.2 setosa
## [到達(dá)getOption("max.print") -- 略過110行]]
iris %>%
arrange(desc(Sepal.Length)) # 降序
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 7.9 3.8 6.4 2.0 virginica
## 2 7.7 3.8 6.7 2.2 virginica
## 3 7.7 2.6 6.9 2.3 virginica
## 4 7.7 2.8 6.7 2.0 virginica
## 5 7.7 3.0 6.1 2.3 virginica
## 6 7.6 3.0 6.6 2.1 virginica
## 7 7.4 2.8 6.1 1.9 virginica
## 8 7.3 2.9 6.3 1.8 virginica
## 9 7.2 3.6 6.1 2.5 virginica
## 10 7.2 3.2 6.0 1.8 virginica
## 11 7.2 3.0 5.8 1.6 virginica
## 12 7.1 3.0 5.9 2.1 virginica
## 13 7.0 3.2 4.7 1.4 versicolor
## 14 6.9 3.1 4.9 1.5 versicolor
## 15 6.9 3.2 5.7 2.3 virginica
## 16 6.9 3.1 5.4 2.1 virginica
## 17 6.9 3.1 5.1 2.3 virginica
## 18 6.8 2.8 4.8 1.4 versicolor
## 19 6.8 3.0 5.5 2.1 virginica
## 20 6.8 3.2 5.9 2.3 virginica
## 21 6.7 3.1 4.4 1.4 versicolor
## 22 6.7 3.0 5.0 1.7 versicolor
## 23 6.7 3.1 4.7 1.5 versicolor
## 24 6.7 2.5 5.8 1.8 virginica
## 25 6.7 3.3 5.7 2.1 virginica
## 26 6.7 3.1 5.6 2.4 virginica
## 27 6.7 3.3 5.7 2.5 virginica
## 28 6.7 3.0 5.2 2.3 virginica
## 29 6.6 2.9 4.6 1.3 versicolor
## 30 6.6 3.0 4.4 1.4 versicolor
## 31 6.5 2.8 4.6 1.5 versicolor
## 32 6.5 3.0 5.8 2.2 virginica
## 33 6.5 3.2 5.1 2.0 virginica
## 34 6.5 3.0 5.5 1.8 virginica
## 35 6.5 3.0 5.2 2.0 virginica
## 36 6.4 3.2 4.5 1.5 versicolor
## 37 6.4 2.9 4.3 1.3 versicolor
## 38 6.4 2.7 5.3 1.9 virginica
## 39 6.4 3.2 5.3 2.3 virginica
## 40 6.4 2.8 5.6 2.1 virginica
## [到達(dá)getOption("max.print") -- 略過110行]]
新增變量
mutate()
可以更新或者新增數(shù)據(jù)框一列。
iris %>%
mutate(Sepal.Length = Sepal.Length * 10) # 將該列數(shù)值變成以mm為單位
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 51 3.5 1.4 0.2 setosa
## 2 49 3.0 1.4 0.2 setosa
## 3 47 3.2 1.3 0.2 setosa
## 4 46 3.1 1.5 0.2 setosa
## 5 50 3.6 1.4 0.2 setosa
## 6 54 3.9 1.7 0.4 setosa
## 7 46 3.4 1.4 0.3 setosa
## 8 50 3.4 1.5 0.2 setosa
## 9 44 2.9 1.4 0.2 setosa
## 10 49 3.1 1.5 0.1 setosa
## 11 54 3.7 1.5 0.2 setosa
## 12 48 3.4 1.6 0.2 setosa
## 13 48 3.0 1.4 0.1 setosa
## 14 43 3.0 1.1 0.1 setosa
## 15 58 4.0 1.2 0.2 setosa
## 16 57 4.4 1.5 0.4 setosa
## 17 54 3.9 1.3 0.4 setosa
## 18 51 3.5 1.4 0.3 setosa
## 19 57 3.8 1.7 0.3 setosa
## 20 51 3.8 1.5 0.3 setosa
## 21 54 3.4 1.7 0.2 setosa
## 22 51 3.7 1.5 0.4 setosa
## 23 46 3.6 1.0 0.2 setosa
## 24 51 3.3 1.7 0.5 setosa
## 25 48 3.4 1.9 0.2 setosa
## 26 50 3.0 1.6 0.2 setosa
## 27 50 3.4 1.6 0.4 setosa
## 28 52 3.5 1.5 0.2 setosa
## 29 52 3.4 1.4 0.2 setosa
## 30 47 3.2 1.6 0.2 setosa
## 31 48 3.1 1.6 0.2 setosa
## 32 54 3.4 1.5 0.4 setosa
## 33 52 4.1 1.5 0.1 setosa
## 34 55 4.2 1.4 0.2 setosa
## 35 49 3.1 1.5 0.2 setosa
## 36 50 3.2 1.2 0.2 setosa
## 37 55 3.5 1.3 0.2 setosa
## 38 49 3.6 1.4 0.1 setosa
## 39 44 3.0 1.3 0.2 setosa
## 40 51 3.4 1.5 0.2 setosa
## [到達(dá)getOption("max.print") -- 略過110行]]
iris %>%
mutate(SLMn = Sepal.Length * 10) # 創(chuàng)建新的一列
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species SLMn
## 1 5.1 3.5 1.4 0.2 setosa 51
## 2 4.9 3.0 1.4 0.2 setosa 49
## 3 4.7 3.2 1.3 0.2 setosa 47
## 4 4.6 3.1 1.5 0.2 setosa 46
## 5 5.0 3.6 1.4 0.2 setosa 50
## 6 5.4 3.9 1.7 0.4 setosa 54
## 7 4.6 3.4 1.4 0.3 setosa 46
## 8 5.0 3.4 1.5 0.2 setosa 50
## 9 4.4 2.9 1.4 0.2 setosa 44
## 10 4.9 3.1 1.5 0.1 setosa 49
## 11 5.4 3.7 1.5 0.2 setosa 54
## 12 4.8 3.4 1.6 0.2 setosa 48
## 13 4.8 3.0 1.4 0.1 setosa 48
## 14 4.3 3.0 1.1 0.1 setosa 43
## 15 5.8 4.0 1.2 0.2 setosa 58
## 16 5.7 4.4 1.5 0.4 setosa 57
## 17 5.4 3.9 1.3 0.4 setosa 54
## 18 5.1 3.5 1.4 0.3 setosa 51
## 19 5.7 3.8 1.7 0.3 setosa 57
## 20 5.1 3.8 1.5 0.3 setosa 51
## 21 5.4 3.4 1.7 0.2 setosa 54
## 22 5.1 3.7 1.5 0.4 setosa 51
## 23 4.6 3.6 1.0 0.2 setosa 46
## 24 5.1 3.3 1.7 0.5 setosa 51
## 25 4.8 3.4 1.9 0.2 setosa 48
## 26 5.0 3.0 1.6 0.2 setosa 50
## 27 5.0 3.4 1.6 0.4 setosa 50
## 28 5.2 3.5 1.5 0.2 setosa 52
## 29 5.2 3.4 1.4 0.2 setosa 52
## 30 4.7 3.2 1.6 0.2 setosa 47
## 31 4.8 3.1 1.6 0.2 setosa 48
## 32 5.4 3.4 1.5 0.4 setosa 54
## 33 5.2 4.1 1.5 0.1 setosa 52
## [到達(dá)getOption("max.print") -- 略過117行]]
整合函數(shù)流:
iris %>%
filter(Species == "Virginica") %>%
mutate(SLMm = Sepal.Length) %>%
arrange(desc(SLMm))
## [1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## [6] SLMm
## <0 行> (或0-長度的row.names)
匯總
summarize()
函數(shù)可以讓我們將很多變量匯總為單個的數(shù)據(jù)點(diǎn)陶耍。
iris %>%
summarize(medianSL = median(Sepal.Length))
## medianSL
## 1 5.8
iris %>%
filter(Species == "virginica") %>%
summarize(medianSL=median(Sepal.Length))
## medianSL
## 1 6.5
還可以一次性匯總多個變量
iris %>%
filter(Species == "virginica") %>%
summarize(medianSL = median(Sepal.Length),
maxSL = max(Sepal.Length))
## medianSL maxSL
## 1 6.5 7.9
group_by()
可以讓我們安裝指定的組別進(jìn)行匯總數(shù)據(jù)奋蔚,而不是針對整個數(shù)據(jù)框
iris %>%
group_by(Species) %>%
summarize(medianSL = median(Sepal.Length),
maxSL = max(Sepal.Length))
## # A tibble: 3 x 3
## Species medianSL maxSL
## <fct> <dbl> <dbl>
## 1 setosa 5.00 5.80
## 2 versicolor 5.90 7.00
## 3 virginica 6.50 7.90
iris %>%
filter(Sepal.Length>6) %>%
group_by(Species) %>%
summarize(medianPL = median(Petal.Length),
maxPL = max(Petal.Length))
## # A tibble: 2 x 3
## Species medianPL maxPL
## <fct> <dbl> <dbl>
## 1 versicolor 4.60 5.00
## 2 virginica 5.60 6.90
ggplot2
散點(diǎn)圖
散點(diǎn)圖可以幫助我們理解兩個變量的數(shù)據(jù)關(guān)系,使用geom_point()
可以繪制散點(diǎn)圖:
iris_small <- iris %>%
filter(Sepal.Length > 5)
ggplot(iris_small, aes(x = Petal.Length,
y = Petal.Width)) +
geom_point()
img
額外的美學(xué)映射
- 顏色
ggplot(iris_small, aes(x = Petal.Length,
y = Petal.Width,
color = Species)) +
geom_point()
img
- 大小
ggplot(iris_small, aes(x = Petal.Length,
y = Petal.Width,
color = Species,
size = Sepal.Length)) +
geom_point()
img
- 分面
ggplot(iris_small, aes(x = Petal.Length,
y = Petal.Width)) +
geom_point() +
facet_wrap(~Species)
img
線圖
by_year <- gapminder %>%
group_by(year) %>%
summarize(medianGdpPerCap = median(gdpPercap))
ggplot(by_year, aes(x = year,
y = medianGdpPerCap)) +
geom_line() +
expand_limits(y=0)
img
條形圖
by_species <- iris %>%
filter(Sepal.Length > 6) %>%
group_by(Species) %>%
summarize(medianPL=median(Petal.Length))
ggplot(by_species, aes(x = Species, y=medianPL)) +
geom_col()
img
直方圖
ggplot(iris_small, aes(x = Petal.Length)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
img
箱線圖
ggplot(iris_small, aes(x=Species, y=Sepal.Length)) +
geom_boxplot()
img
資料來源:DataCamp