第6天的學(xué)習(xí)內(nèi)容片效,簡單羅列如下:
Day 6. 學(xué)習(xí)R包.png
以dplyr為例俊嗽,學(xué)習(xí)和使用R包——多個(gè)函數(shù)的集合堕仔。
安裝和加載
# The easiest way to get dplyr is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just dplyr:
install.packages("dplyr")
library(dplyr) # or require(dplyr)
test <- iris[c(1:2,51:52,101:102),] #This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
R包dplyr的基礎(chǔ)函數(shù)
1. mutate() 可以在現(xiàn)有變量的基礎(chǔ)上添加新變量
> mutate(test, new = Sepal.Length * Sepal.Width) # mutate() adds new variables that are functions of existing variables.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
1 5.1 3.5 1.4 0.2 setosa 17.85
2 4.9 3.0 1.4 0.2 setosa 14.70
3 7.0 3.2 4.7 1.4 versicolor 22.40
4 6.4 3.2 4.5 1.5 versicolor 20.48
5 6.3 3.3 6.0 2.5 virginica 20.79
6 5.8 2.7 5.1 1.9 virginica 15.66
2. select() 按列號或列名篩選變量
>select(test,1) # or select(test,Sepal.Length). # select() picks variables based on their names.
Sepal.Length
1 5.1
2 4.9
51 7.0
52 6.4
101 6.3
102 5.8
> select(test, c(1,5)) # or select(test, Petal.Length, Petal.Species) or select(test, one_of("Petal.Length", "Petal.Species"))
Sepal.Length Species
1 5.1 setosa
2 4.9 setosa
51 7.0 versicolor
52 6.4 versicolor
101 6.3 virginica
102 5.8 virginica
3. filter() 按行名篩選
> filter(test, Species == "setosa") # filter() picks cases based on their values.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
> filter(test, Species == "setosa"&Sepal.Length > 5 )
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
> filter(test, Species %in% c("setosa","versicolor"))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 7.0 3.2 4.7 1.4 versicolor
4 6.4 3.2 4.5 1.5 versicolor
4. summarise() 對多個(gè)數(shù)值匯總
# summarise() reduces multiple values down to a single summary.
> summarise(test, mean(Sepal.Length), sd(Sepal.Length)) # Calculate the mean and standard deviation of Sepal.Length.
mean(Sepal.Length) sd(Sepal.Length)
1 5.916667 0.8084965
> group_by(test, Species) # Group test by Species
# A tibble: 6 x 5
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
* <dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 7 3.2 4.7 1.4 versicolor
4 6.4 3.2 4.5 1.5 versicolor
5 6.3 3.3 6 2.5 virginica
6 5.8 2.7 5.1 1.9 virginica
> summarise(group_by(test, Species),mean(Sepal.Length), sd(Sepal.Length))
# A tibble: 3 x 3
Species `mean(Sepal.Length)` `sd(Sepal.Length)`
<fct> <dbl> <dbl>
1 setosa 5 0.141
2 versicolor 6.7 0.424
3 virginica 6.05 0.354
5. arrange() 按某1列或某幾列對整個(gè)表格進(jìn)行排序
> arrange(test, Sepal.Length) # The default order is from small to large. # arrange() changes the ordering of the rows.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.9 3.0 1.4 0.2 setosa
2 5.1 3.5 1.4 0.2 setosa
3 5.8 2.7 5.1 1.9 virginica
4 6.3 3.3 6.0 2.5 virginica
5 6.4 3.2 4.5 1.5 versicolor
6 7.0 3.2 4.7 1.4 versicolor
> arrange(test, desc(Sepal.Length)) # Reverse the order using desc()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 7.0 3.2 4.7 1.4 versicolor
2 6.4 3.2 4.5 1.5 versicolor
3 6.3 3.3 6.0 2.5 virginica
4 5.8 2.7 5.1 1.9 virginica
5 5.1 3.5 1.4 0.2 setosa
6 4.9 3.0 1.4 0.2 setosa
R包dplyr的實(shí)用技能
1. 管道
> test %>% # Use %>% to emphasise a sequence of actions, rather than the object that the actions are being performed on.
+ group_by(Species) %>% # %>% should always have a space before it, and should usually be followed by a new line.
+ summarise(mean(Sepal.Length), sd(Sepal.Length))
# A tibble: 3 x 3
Species `mean(Sepal.Length)` `sd(Sepal.Length)`
<fct> <dbl> <dbl>
1 setosa 5 0.141
2 versicolor 6.7 0.424
3 virginica 6.05 0.354
2. 統(tǒng)計(jì)某列中的記錄并計(jì)數(shù)
> count(test,Species)
# A tibble: 3 x 2
Species n
<fct> <int>
1 setosa 2
2 versicolor 2
3 virginica 2