前面的2節(jié)介紹了數(shù)據(jù)的格式轉(zhuǎn)換寬表轉(zhuǎn)長(zhǎng)表函數(shù)pivot_longer以及行過(guò)濾函數(shù)filter( )今天來(lái)繼續(xù)介紹列選擇函數(shù)select( )的使用
選擇列:基礎(chǔ)
要選擇幾列尤误,只需在select函數(shù)中添加其名稱即可霹琼。添加它們的順序?qū)⒋_定它們?cè)谳敵鲋谐霈F(xiàn)的順序
msleep %>%
select(name, genus, sleep_total, awake)
# A tibble: 83 x 4
name genus sleep_total awake
<chr> <chr> <dbl> <dbl>
1 Cheetah Acinonyx 12.1 11.9
2 Owl monkey Aotus 17 7
3 Mountain beaver Aplodontia 14.4 9.6
如果要添加大量的列,可以使用start_col:end_col的語(yǔ)句:
msleep %>%
select(name:order,sleep_cycle:brainwt)
# A tibble: 83 x 7
name genus vore order sleep_cycle awake brainwt
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Cheetah Acinonyx carni Carnivora NA 11.9 NA
2 Owl monkey Aotus omni Primates NA 7 0.0155
3 Mountain beaver Aplodontia herbi Rodentia NA 9.6 NA
還可以通過(guò)在列名稱前面添加減號(hào)來(lái)取消列
msleep %>%
select(-conservation, -(sleep_total:awake))
# A tibble: 83 x 6
name genus vore order brainwt bodywt
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Cheetah Acinonyx carni Carnivora NA 50
2 Owl monkey Aotus omni Primates 0.0155 0.48
3 Mountain beaver Aplodontia herbi Rodentia NA 1.35
根據(jù)部分列名選擇列
如果有很多的列具有相似的結(jié)構(gòu)叉趣,可以通過(guò)starts_with()蓬蝶,ends_with()或contains( )來(lái)進(jìn)行選擇
msleep %>% select(name, starts_with("sleep"))
# A tibble: 83 x 4
name sleep_total sleep_rem sleep_cycle
<chr> <dbl> <dbl> <dbl>
1 Cheetah 12.1 NA NA
2 Owl monkey 17 1.8 NA
3 Mountain beaver 14.4 2.4 NA
msleep %>%
select(contains("eep"), ends_with("wt"))
# A tibble: 83 x 5
sleep_total sleep_rem sleep_cycle brainwt bodywt
<dbl> <dbl> <dbl> <dbl> <dbl>
1 12.1 NA NA NA 50
2 17 1.8 NA 0.0155 0.48
3 14.4 2.4 NA NA 1.35
根據(jù)正則表達(dá)式選擇列
如果列名沒(méi)有相似性截亦,則可以使用matches()來(lái)進(jìn)行選擇;
以下示例代碼將添加包含“ o”丹锹,后跟一個(gè)或多個(gè)其他字母和“ er”的列
msleep %>% select(matches("o.+er"))
# A tibble: 83 x 2
order conservation
<chr> <chr>
1 Carnivora lc
2 Primates NA
3 Rodentia nt
根據(jù)數(shù)據(jù)集來(lái)選擇列
class <- c("name", "genus", "vore", "order", "conservation")
msleep %>% select(!!class)
# A tibble: 83 x 5
name genus vore order conservation
<chr> <chr> <chr> <chr> <chr>
1 Cheetah Acinonyx carni Carnivora lc
2 Owl monkey Aotus omni Primates NA
3 Mountain beaver Aplodontia herbi Rodentia nt
按數(shù)據(jù)類型選擇列 (重點(diǎn))
select_if( )函數(shù)來(lái)判斷數(shù)據(jù)類型,可以使用其來(lái)選擇所有字符串列select_if(is.character)肢藐。同樣也可以添加is.numeric, is.integer吱韭,is.double吆豹,is.logical,is.factor等列理盆;如果有日期列痘煤,則可以加載lubridate包,然后使用 is.POSIXt或is.Date
msleep %>% select_if(is.numeric)
# A tibble: 83 x 6
sleep_total sleep_rem sleep_cycle awake brainwt bodywt
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12.1 NA NA 11.9 NA 50
2 17 1.8 NA 7 0.0155 0.48
3 14.4 2.4 NA 9.6 NA 1.35
同樣也可以取反猿规,選擇不需要那種數(shù)據(jù)類型的列
msleep %>% select_if(~!is.numeric(.))
# A tibble: 83 x 5
name genus vore order conservation
<chr> <chr> <chr> <chr> <chr>
1 Cheetah Acinonyx carni Carnivora lc
2 Owl monkey Aotus omni Primates NA
3 Mountain beaver Aplodontia herbi Rodentia nt
通過(guò)邏輯表達(dá)式選擇列 (重點(diǎn))
select_if( )不僅僅是基于數(shù)據(jù)類型來(lái)進(jìn)行選擇衷快。還可以選擇所有列平均值大于10的列。
mean > 10它本身不是函數(shù)姨俩,因此需要在前面添加波浪號(hào)蘸拔,或使用funs()將語(yǔ)句轉(zhuǎn)換為函數(shù)
msleep %>% select_if(is.numeric) %>%
select_if(~mean(., na.rm=TRUE) > 10)
# A tibble: 83 x 3
sleep_total awake bodywt
<dbl> <dbl> <dbl>
1 12.1 11.9 50
2 17 7 0.48
3 14.4 9.6 1.35
也可以這樣寫
msleep %>%
select_if(~is.numeric(.) & mean(., na.rm=TRUE) > 10)
另一個(gè)有用的select_if參數(shù)是n_distinct(),它可以在列中找到的不同值出現(xiàn)的數(shù)量
msleep %>% select_if(~n_distinct(.) < 10)
vore conservation
<chr> <chr>
1 carni lc
2 omni NA
3 herbi nt
4 omni lc
對(duì)列進(jìn)行重新排序
everything()函數(shù)可將選擇的列移至表格最前
msleep %>%
select(conservation, sleep_total, everything())
conservation sleep_total name genus vore order sleep_rem sleep_cycle awake brainwt
<chr> <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 lc 12.1 Cheetah Acin… carni Carni… NA NA 11.9 NA
2 NA 17 Owl monk… Aotus omni Prima… 1.8 NA 7 0.0155
3 nt 14.4 Mountain… Aplo… herbi Roden… 2.4 NA 9.6 NA
更改列名
msleep %>%
select(animal = name, sleep_total, extinction_threat = conservation)
# A tibble: 83 x 3
animal sleep_total extinction_threat
<chr> <dbl> <chr>
1 Cheetah 12.1 lc
2 Owl monkey 17 NA
3 Mountain beaver 14.4 nt
也可以通過(guò)rename()函數(shù)來(lái)重命名
msleep %>%
rename(animal = name, extinction_threat = conservation)
# A tibble: 83 x 11
animal genus vore order extinction_thre… sleep_total sleep_rem sleep_cycle awake brainwt
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Cheet… Acin… carni Carn… lc 12.1 NA NA 11.9 NA
2 Owl m… Aotus omni Prim… NA 17 1.8 NA 7 0.0155
3 Mount… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6 NA
重新格式化所有列名
select_all()函數(shù)允許更改所有列环葵,并以一個(gè)函數(shù)作為參數(shù)调窍。
要以大寫形式獲取所有列名稱,可以使用toupper(),也可以使用tolower()將其全部轉(zhuǎn)化為小寫
msleep %>% select_all(toupper)
NAME GENUS VORE ORDER CONSERVATION SLEEP_TOTAL SLEEP_REM SLEEP_CYCLE AWAKE BRAINWT
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Cheetah Acin… carni Carni… lc 12.1 NA NA 11.9 NA
2 Owl monk… Aotus omni Prima… NA 17 1.8 NA 7 0.0155
3 Mountain… Aplo… herbi Roden… nt 14.4 2.4 NA 9.6 NA
自主創(chuàng)建函數(shù)(重點(diǎn))
將列名中的空格替換為下劃線
msleep2 <- select(msleep, name, sleep_total, brainwt)
colnames(msleep2) <- c("name", "sleep total", "brain weight")
msleep2 %>%
select_all(~str_replace(., " ", "_"))
name sleep_total brain_weight
<chr> <dbl> <dbl>
1 Cheetah 12.1 NA
2 Owl monkey 17 0.0155
3 Mountain beaver 14.4 NA
4 Greater short-tailed shrew 14.9 0.00029
還可以使用select_all與str_replace來(lái)消除多余的字符
msleep2 <- select(msleep, name, sleep_total, brainwt)
colnames(msleep2) <- c("Q1 name", "Q2 sleep total", "Q3 brain weight")
msleep2[1:3,]
`Q1 name` `Q2 sleep total` `Q3 brain weight`
<chr> <dbl> <dbl>
1 Cheetah 12.1 NA
2 Owl monkey 17 0.0155
3 Mountain beaver 14.4 NA
msleep2 %>%
select_all(~str_replace(., "Q[0-9]+", "")) %>%
select_all(~str_replace(., " ", "_"))
`_name` `_sleep total` `_brain weight`
<chr> <dbl> <dbl>
1 Cheetah 12.1 NA
2 Owl monkey 17 0.0155
3 Mountain beaver 14.4 NA
行名稱到列
某些數(shù)據(jù)框的行名本身實(shí)際上并不是一列张遭,例如mtcars數(shù)據(jù)集
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
如果希望此列為實(shí)際列邓萨,則可以使用該 rownames_to_column()函數(shù),并指定新的列名稱
mtcars %>%
tibble::rownames_to_column("car_model") %>% head
car_model mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
本節(jié)介紹了select( )函數(shù)的絕大部分使用方法菊卷,在以后的數(shù)據(jù)處理中希望多多查閱一定能大大提高數(shù)據(jù)處理的效率缔恳,下一節(jié)將介紹mutate( )函數(shù),敬請(qǐng)期待