分享{dplyr}中一些鮮為人知的功能夕膀,喜歡可以關注我的公眾號R語言數(shù)據(jù)分析指南
library(tidyverse)
data("penguins", package = "palmerpenguins")
penguins <- na.omit(penguins)
1.rename()里面select()
penguins %>%
select(species, island) %>%
rename(penguin_species = species)
penguins %>%
select(penguin_species = species,
island)
2.rename()里面count()
penguins %>% count(species) %>%
rename(total = n)
penguins %>%
count(species, name = "total")
還可以使用與select()上面的示例來計數(shù)并重新命名
penguins %>%
count(species) %>%
rename(total = n,penguin_species = species)
penguins %>%
count(penguin_species = species, name = "total")
請注意,傳遞給name參數(shù)的新名稱必須用引號引起來烙荷,但選定列的新名稱不必用引號引起來
3.mutate()里面count()
這很簡單-您只需在mutate()內(nèi)部執(zhí)行以下操作即可count()
penguins %>%
mutate(long_beak = bill_length_mm > 50) %>%
count(long_beak)
penguins %>%
count(long_beak = bill_length_mm > 50)
當然,當指定多個變量以下列方式計數(shù)時檬寂,此方法也適用
penguins %>%
mutate(long_beak = bill_length_mm > 50,
is_adelie = species == "Adelie") %>%
count(is_adelie, long_beak)
penguins %>%
count(long_beak = bill_length_mm > 50,
is_adelie = species == "Adelie")
4. transmute()+select()
penguins %>%
mutate(body_mass_kg = body_mass_g/1000) %>%
select(body_mass_kg)
penguins %>%
transmute(body_mass_kg = body_mass_g/1000)
transmute()過去我很少使用過奢讨,因為我認為它只能返回經(jīng)過修改的列,這將是非常有限的(例如焰薄,在上面的示例中拿诸,以公斤為單位的企鵝體重有什么好處?)
但是實際上塞茅,您只可以命名要包括的列亩码,transmute()就像select()繼承未修改的列一樣。當然野瘦,您可以在執(zhí)行操作時對其“重命名
penguins %>%
mutate(body_mass_kg = body_mass_g/1000) %>%
select(species, island, body_mass_kg) %>%
rename(penguin_species = species)
penguins %>%
transmute(penguin_species = species,
island,
body_mass_kg = body_mass_g/1000)
5.ungroup()里面summarize()
penguins %>%
group_by(island, species) %>%
summarize(mean_mass = mean(body_mass_g, na.rm = TRUE)) %>%
ungroup()
因為summarize()僅按defaut刪除最后一個分組變量描沟,這意味著如果ungroup()不調(diào)用island,輸出仍按變量分組:
penguins %>%
group_by(island, species) %>%
summarize(mean_mass = mean(body_mass_g, na.rm = TRUE)) %>%
group_vars()
penguins %>%
group_by(island, species) %>%
summarize(mean_mass = mean(body_mass_g, na.rm = TRUE)) %>%
ungroup() %>%
group_vars()
也可以簡單地設置.groups參數(shù)內(nèi)summarize()鞭光,為'drop'達到相同的:
penguins %>%
group_by(island, species) %>%
summarize(mean_mass = mean(body_mass_g, na.rm = TRUE), .groups = 'drop')
# A tibble: 5 x 3
island species mean_mass
<fct> <fct> <dbl>
1 Biscoe Adelie 3710.
2 Biscoe Gentoo 5092.
3 Dream Adelie 3701.
4 Dream Chinstrap 3733.
5 Torgersen Adelie 3709.
6. arrange()+其他功能slice()
如果您想獲取按列排序的前n行吏廉,則可以使用top_n(),它提供了一種更簡單的方式 slice()+arrange():
penguins %>%
arrange(desc(body_mass_g)) %>%
slice(1:5)
penguins %>%
top_n(5, wt = body_mass_g)
penguins %>%
top_n(5, wt = body_mass_g)
penguins %>%
slice_max(order_by = body_mass_g, n = 5)
新slice_*()功能最重大的變化是為分組數(shù)據(jù)幀添加了適當?shù)男袨?/p>
例如惰许,下面的示例返回每種物種的重量百分比最高的5%的企鵝:
penguins %>%
group_by(species) %>%
slice_max(body_mass_g, prop = .05)
7.按組進行計數(shù)和求和 add_count()
add_count() 添加一列席覆,其中包含每個組(或組的組合)的計數(shù)
##### Long Form #####
# penguins %>%
# group_by(species) %>%
# mutate(count_by_species = n()) %>%
# ungroup()
penguins %>%
add_count(species, name = "count_by_species") %>%
select(-contains("mm"))
# A tibble: 333 x 6
species island body_mass_g sex year count_by_species
<fct> <fct> <int> <fct> <int> <int>
1 Adelie Torgersen 3750 male 2007 146
2 Adelie Torgersen 3800 female 2007 146
3 Adelie Torgersen 3250 female 2007 146
4 Adelie Torgersen 3450 female 2007 146
5 Adelie Torgersen 3650 male 2007 146
6 Adelie Torgersen 3625 female 2007 146
7 Adelie Torgersen 4675 male 2007 146
8 Adelie Torgersen 3200 female 2007 146
9 Adelie Torgersen 3800 male 2007 146
10 Adelie Torgersen 4400 male 2007 146
# ... with 323 more rows
可以使用wt來按組有效地獲取總和(也許有點笨拙,但非常有用)
##### Long Form #####
# penguins %>%
# group_by(species) %>%
# mutate(total_weight_by_species = sum(body_mass_g)) %>%
# ungroup()
penguins %>%
add_count(species, wt = body_mass_g,
name ="total_weight_by_species") %>%
select(-contains("mm"))
# A tibble: 333 x 6
species island body_mass_g sex year total_weight_by_species
<fct> <fct> <int> <fct> <int> <int>
1 Adelie Torgersen 3750 male 2007 541100
2 Adelie Torgersen 3800 female 2007 541100
3 Adelie Torgersen 3250 female 2007 541100
4 Adelie Torgersen 3450 female 2007 541100
5 Adelie Torgersen 3650 male 2007 541100
6 Adelie Torgersen 3625 female 2007 541100
7 Adelie Torgersen 4675 male 2007 541100
8 Adelie Torgersen 3200 female 2007 541100
9 Adelie Torgersen 3800 male 2007 541100
10 Adelie Torgersen 4400 male 2007 541100
# ... with 323 more rows
默認情況下汹买,add_tally()添加行數(shù)佩伤,您已經(jīng)可以使用mutate(n = n())進行處理
penguins %>%
add_count(species, wt = body_mass_g,
name = "total_weight_by_species") %>%
add_tally(wt = body_mass_g,
name = "total_weight_of_all_species") %>%
select(1:2, last_col(0):last_col(1))
# A tibble: 333 x 4
species island total_weight_of_all_species total_weight_by_species
<fct> <fct> <int> <int>
1 Adelie Torgersen 1400950 541100
2 Adelie Torgersen 1400950 541100
3 Adelie Torgersen 1400950 541100
4 Adelie Torgersen 1400950 541100
5 Adelie Torgersen 1400950 541100
6 Adelie Torgersen 1400950 541100
7 Adelie Torgersen 1400950 541100
8 Adelie Torgersen 1400950 541100
9 Adelie Torgersen 1400950 541100
10 Adelie Torgersen 1400950 541100
# ... with 323 more rows