首先是加載相關(guān)的包,mutate主要屬于dplyr包里相叁,這里我們統(tǒng)一使用tidyverse包庇麦。
tidyverse包中含有各種數(shù)據(jù)整理以及畫圖的包,如下加載tidyverse包:
> library(tidyverse)
-- Attaching packages ------------------------ tidyverse 1.3.0 --
√ ggplot2 3.3.3 √ purrr 0.3.4
√ tibble 3.0.5 √ dplyr 1.0.3
√ tidyr 1.1.2 √ stringr 1.4.0
√ readr 1.4.0 √ forcats 0.5.1
-- Conflicts --------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
參考
https://dplyr.tidyverse.org/reference/mutate_all.html
教材《R數(shù)據(jù)科學(xué)》
mutate函數(shù)
mutate() 的主要功能是為數(shù)據(jù)框增加列妻导。mutate總是把新的列加在數(shù)據(jù)集的最后。新列一旦創(chuàng)建就可以立即使用怀各。
一個(gè)簡(jiǎn)單的栗子:
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
#在最后的地方增加新列
> mutate(iris, new_col = Petal.Length + Petal.Width) %>% head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_col
1 5.1 3.5 1.4 0.2 setosa 1.6
2 4.9 3.0 1.4 0.2 setosa 1.6
3 4.7 3.2 1.3 0.2 setosa 1.5
4 4.6 3.1 1.5 0.2 setosa 1.7
5 5.0 3.6 1.4 0.2 setosa 1.6
6 5.4 3.9 1.7 0.4 setosa 2.1
PS:%>%是管道符號(hào)倔韭,用于把前面的數(shù)據(jù)向后傳遞,避免函數(shù)嵌套瓢对,增加代碼的可閱讀性寿酌。
mutate還有三個(gè)衍生函數(shù):
mutate_at(); mutate_if()硕蛹; mutate_all()
在官網(wǎng)上的關(guān)于這三個(gè)后綴的解釋如下:
_all: affects every variable
_at: affects variables selected with a character vector or vars()
_if : affects variables selected with a predicate function:
其中醇疼,all是針對(duì)所有列,at是針對(duì)特定的列法焰,if的滿足特定條件的列
參數(shù)如下:
mutate_all(.tbl, .funs, ...)
mutate_if(.tbl, .predicate, .funs, ...)
mutate_at(.tbl, .vars, .funs, ..., .cols = NULL)
Arguments
解釋一下官網(wǎng)給出的例子
mutate_at
scale2 <- function(x, na.rm = FALSE)(x - mean(x, na.rm = na.rm)) / sd(x, na.rm)
starwars %>% mutate_at(c("height", "mass"), scale2)
# A tibble: 87 x 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Luke S~ NA NA blond fair blue 19 male mascu~
2 C-3PO NA NA NA gold yellow 112 none mascu~
3 R2-D2 NA NA NA white, bl~ red 33 none mascu~
4 Darth ~ NA NA none white yellow 41.9 male mascu~
5 Leia O~ NA NA brown light brown 19 fema~ femin~
6 Owen L~ NA NA brown, gr~ light blue 52 male mascu~
7 Beru W~ NA NA brown light blue 47 fema~ femin~
8 R5-D4 NA NA NA white, red red NA none mascu~
9 Biggs ~ NA NA black light brown 24 male mascu~
10 Obi-Wa~ NA NA auburn, w~ fair blue-gray 57 male mascu~
# ... with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
# films <list>, vehicles <list>, starships <list>
在height秧荆,mass列執(zhí)行scale2
以下兩個(gè)命令是等同的
starwars %>% mutate_at(c(height,mass), scale2)
starwars %>% mutate(across(c("height", "mass"), scale2))
PS: across() 即讓函數(shù)穿過所選擇的列,即同時(shí)對(duì)所選擇的多列應(yīng)用若干函數(shù)埃仪,這里和mutate聯(lián)合使用乙濒,達(dá)到mutate_at的作用。
mutate_at的參數(shù)中使用vars(), funs()來完善整個(gè)函數(shù)
eg:
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> mutate_at(iris, vars(-Species), funs(log(.))) %>% head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 1.629241 1.252763 0.3364722 -1.6094379 setosa
2 1.589235 1.098612 0.3364722 -1.6094379 setosa
3 1.547563 1.163151 0.2623643 -1.6094379 setosa
4 1.526056 1.131402 0.4054651 -1.6094379 setosa
5 1.609438 1.280934 0.3364722 -1.6094379 setosa
6 1.686399 1.360977 0.5306283 -0.9162907 setosa
mutate_if
starwars %>% mutate_if(is.numeric, scale2, na.rm = TRUE)
# A tibble: 87 x 14
name height mass hair_color skin_color eye_color birth_year sex
<chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr>
1 Luke Skyw~ -0.0678 -0.120 blond fair blue -0.443 male
2 C-3PO -0.212 -0.132 NA gold yellow 0.158 none
3 R2-D2 -2.25 -0.385 NA white, bl~ red -0.353 none
4 Darth Vad~ 0.795 0.228 none white yellow -0.295 male
5 Leia Orga~ -0.701 -0.285 brown light brown -0.443 fema~
6 Owen Lars 0.105 0.134 brown, grey light blue -0.230 male
7 Beru Whit~ -0.269 -0.132 brown light blue -0.262 fema~
8 R5-D4 -2.22 -0.385 NA white, red red NA none
9 Biggs Dar~ 0.249 -0.0786 black light brown -0.411 male
10 Obi-Wan K~ 0.220 -0.120 auburn, wh~ fair blue-gray -0.198 male
# ... with 77 more rows, and 6 more variables: gender <chr>, homeworld <chr>,
# species <chr>, films <list>, vehicles <list>, starships <list>
同理卵蛉,這兩行代碼的性質(zhì)也是一樣的
starwars %>% mutate_if(is.numeric, scale2, na.rm = TRUE)
starwars %>% mutate(across(where(is.numeric), scale2, na.rm = TRUE))
使用where函數(shù)篩選出numeric的列颁股,再使用across聯(lián)合這些列么库,因此函數(shù)可以特定的穿過這些列,達(dá)到mutate_if的作用豌蟋。
如果你想對(duì)數(shù)據(jù)框中的某列同時(shí)使用多個(gè)函數(shù)廊散,使用list()。當(dāng)同時(shí)使用多個(gè)function時(shí)梧疲,將會(huì)創(chuàng)建一個(gè)新的列允睹,而不是像之前那樣在原列上進(jìn)行修飾。
eg:
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> iris %>% mutate_if(is.numeric, list(scale2, log)) %>% head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_fn1
1 5.1 3.5 1.4 0.2 setosa -0.8976739
2 4.9 3.0 1.4 0.2 setosa -1.1392005
3 4.7 3.2 1.3 0.2 setosa -1.3807271
4 4.6 3.1 1.5 0.2 setosa -1.5014904
5 5.0 3.6 1.4 0.2 setosa -1.0184372
6 5.4 3.9 1.7 0.4 setosa -0.5353840
Sepal.Width_fn1 Petal.Length_fn1 Petal.Width_fn1 Sepal.Length_fn2
1 1.01560199 -1.335752 -1.311052 1.629241
2 -0.13153881 -1.335752 -1.311052 1.589235
3 0.32731751 -1.392399 -1.311052 1.547563
4 0.09788935 -1.279104 -1.311052 1.526056
5 1.24503015 -1.335752 -1.311052 1.609438
6 1.93331463 -1.165809 -1.048667 1.686399
Sepal.Width_fn2 Petal.Length_fn2 Petal.Width_fn2
1 1.252763 0.3364722 -1.6094379
2 1.098612 0.3364722 -1.6094379
3 1.163151 0.2623643 -1.6094379
4 1.131402 0.4054651 -1.6094379
5 1.280934 0.3364722 -1.6094379
6 1.360977 0.5306283 -0.9162907
還可以進(jìn)一步對(duì)function進(jìn)行命名幌氮,注意下面的dataframe的列名與上面的不一樣缭受,冠以函數(shù)名。
> iris %>% mutate_if(is.numeric, list(scale = scale2, log = log)) %>% head()
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_scale
1 5.1 3.5 1.4 0.2 setosa -0.8976739
2 4.9 3.0 1.4 0.2 setosa -1.1392005
3 4.7 3.2 1.3 0.2 setosa -1.3807271
4 4.6 3.1 1.5 0.2 setosa -1.5014904
5 5.0 3.6 1.4 0.2 setosa -1.0184372
6 5.4 3.9 1.7 0.4 setosa -0.5353840
Sepal.Width_scale Petal.Length_scale Petal.Width_scale Sepal.Length_log
1 1.01560199 -1.335752 -1.311052 1.629241
2 -0.13153881 -1.335752 -1.311052 1.589235
3 0.32731751 -1.392399 -1.311052 1.547563
4 0.09788935 -1.279104 -1.311052 1.526056
5 1.24503015 -1.335752 -1.311052 1.609438
6 1.93331463 -1.165809 -1.048667 1.686399
Sepal.Width_log Petal.Length_log Petal.Width_log
1 1.252763 0.3364722 -1.6094379
2 1.098612 0.3364722 -1.6094379
3 1.163151 0.2623643 -1.6094379
4 1.131402 0.4054651 -1.6094379
5 1.280934 0.3364722 -1.6094379
6 1.360977 0.5306283 -0.9162907
mutate_all
mutate_all網(wǎng)頁上沒有過多的例子该互,但是根據(jù)其解釋米者,應(yīng)該是對(duì)所有的變量進(jìn)行操作。
> a = matrix(rep(1:5,each =10),10) %>% as.data.frame()
> a
V1 V2 V3 V4 V5
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
6 1 2 3 4 5
7 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
10 1 2 3 4 5
> mutate_all(a,funs(sum(.)))
V1 V2 V3 V4 V5
1 10 20 30 40 50
2 10 20 30 40 50
3 10 20 30 40 50
4 10 20 30 40 50
5 10 20 30 40 50
6 10 20 30 40 50
7 10 20 30 40 50
8 10 20 30 40 50
9 10 20 30 40 50
10 10 20 30 40 50
補(bǔ)充一點(diǎn):
調(diào)用funs時(shí)宇智,可以按照例子那樣自己寫一個(gè)function蔓搞,多個(gè)function使用list(),也可以使用~fun(.)調(diào)用随橘。
starwars %>% mutate_at(c("height", "mass"), ~scale2(., na.rm = TRUE))
總結(jié)
與mutate增加新變量不同喂分,mutate的衍生函數(shù)主要是按列對(duì)數(shù)據(jù)賦予function,如果想增加按行机蔗,可以增加group_by以及rowwise函數(shù)蒲祈。