reshape2包的進(jìn)化版—tidyr包
tidyr包的作者是Hadley Wickham。這個(gè)包常跟dplyr結(jié)合使用浪耘。
本文將演示tidyr包中下述四個(gè)函數(shù)的用法:
gather—寬數(shù)據(jù)轉(zhuǎn)為長數(shù)據(jù)。類似于reshape2包中的melt函數(shù)
spread—長數(shù)據(jù)轉(zhuǎn)為寬數(shù)據(jù)养篓。類似于reshape2包中的cast函數(shù)
unit—多列合并為一列
separate—將一列分離為多列
下面使用datasets包中的mtcars數(shù)據(jù)集做演示液走。
library(tidyr)
library(dplyr)
head(mtcars)
mpg cyl disp? hp drat? ? wt? qsec vs am gear carb
Mazda RX4? ? ? ? 21.0? 6? 160 110 3.90 2.620 16.46? 0? 1? ? 4? ? 4
Mazda RX4 Wag? ? 21.0? 6? 160 110 3.90 2.875 17.02? 0? 1? ? 4? ? 4
Datsun 710? ? ? ? 22.8? 4? 108? 93 3.85 2.320 18.61? 1? 1? ? 4? ? 1
Hornet 4 Drive? ? 21.4? 6? 258 110 3.08 3.215 19.44? 1? 0? ? 3? ? 1
Hornet Sportabout 18.7? 8? 360 175 3.15 3.440 17.02? 0? 0? ? 3? ? 2
Valiant? ? ? ? ? 18.1? 6? 225 105 2.76 3.460 20.22? 1? 0? ? 3? ? 1
為方便處理,在數(shù)據(jù)集中增加一列car
mtcars$car <- rownames(mtcars)
mtcars <- mtcars[, c(12, 1:11)]
gather
gather的調(diào)用格式為:
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)
這里挠日,...表示需要聚合的指定列。
與reshape2包中的melt函數(shù)一樣翰舌,得到如下結(jié)果:
mtcarsNew <- mtcars %>% gather(attribute, value, -car)
head(mtcarsNew)
car attribute value
1? ? ? ? Mazda RX4? ? ? mpg? 21.0
2? ? Mazda RX4 Wag? ? ? mpg? 21.0
3? ? ? ? Datsun 710? ? ? mpg? 22.8
4? ? Hornet 4 Drive? ? ? mpg? 21.4
5 Hornet Sportabout? ? ? mpg? 18.7
6? ? ? ? ? Valiant? ? ? mpg? 18.1
tail(mtcarsNew)
car attribute value
347? Porsche 914-2? ? ? carb? ? 2
348? Lotus Europa? ? ? carb? ? 2
349 Ford Pantera L? ? ? carb? ? 4
350? Ferrari Dino? ? ? carb? ? 6
351? Maserati Bora? ? ? carb? ? 8
352? ? Volvo 142E? ? ? carb? ? 2
如你所見嚣潜,除了car列外,其余列聚合成兩列椅贱,分別命名為attribute和value懂算。
tidyr很好的一點(diǎn)是可以只gather若干列而其他列保持不變。如果你想gather在map和gear之間的所有列而保持carb和car列不變庇麦,可以像下面這樣做:
mtcarsNew <- mtcars %>% gather(attribute, value, mpg:gear)
head(mtcarsNew)
car carb attribute value
1? ? ? ? Mazda RX4? ? 4? ? ? mpg? 21.0
2? ? Mazda RX4 Wag? ? 4? ? ? mpg? 21.0
3? ? ? ? Datsun 710? ? 1? ? ? mpg? 22.8
4? ? Hornet 4 Drive? ? 1? ? ? mpg? 21.4
5 Hornet Sportabout? ? 2? ? ? mpg? 18.7
6? ? ? ? ? Valiant? ? 1? ? ? mpg? 18.1
spread
spread的調(diào)用格式為:
spread(data, key, value, fill = NA, convert = FALSE, drop = TRUE)
與reshape2包中的cast函數(shù)一樣计技,得到如下結(jié)果:
mtcarsSpread <- mtcarsNew %>% spread(attribute, value)
head(mtcarsSpread)
car carb? mpg cyl disp? hp drat? ? wt? qsec vs am gear
1? ? ? ? AMC Javelin? ? 2 15.2? 8? 304 150 3.15 3.435 17.30? 0? 0? ? 3
2 Cadillac Fleetwood? ? 4 10.4? 8? 472 205 2.93 5.250 17.98? 0? 0? ? 3
3? ? ? ? Camaro Z28? ? 4 13.3? 8? 350 245 3.73 3.840 15.41? 0? 0? ? 3
4? Chrysler Imperial? ? 4 14.7? 8? 440 230 3.23 5.345 17.42? 0? 0? ? 3
5? ? ? ? Datsun 710? ? 1 22.8? 4? 108? 93 3.85 2.320 18.61? 1? 1? ? 4
6? Dodge Challenger? ? 2 15.5? 8? 318 150 2.76 3.520 16.87? 0? 0? ? 3
unite
unite的調(diào)用格式如下:
unite(data, col, ..., sep = "_", remove = TRUE)
where ... represents the columns to unite and col represents the c
這里,...表示需要合并的列山橄,col表示合并后的列垮媒。
我們先虛構(gòu)一些數(shù)據(jù):
set.seed(1)
date <- as.Date('2016-01-01') + 0:14
hour <- sample(1:24, 15)
min <- sample(1:60, 15)
second <- sample(1:60, 15)
event <- sample(letters, 15)
data <- data.frame(date, hour, min, second, event)
data
date hour min second event
1? 2016-01-01? ? 7? 30? ? 29? ? u
2? 2016-01-02? ? 9? 43? ? 36? ? a
3? 2016-01-03? 13? 58? ? 60? ? l
4? 2016-01-04? 20? 22? ? 11? ? q
5? 2016-01-05? ? 5? 44? ? 47? ? p
6? 2016-01-06? 18? 52? ? 37? ? k
7? 2016-01-07? 19? 12? ? 43? ? r
8? 2016-01-08? 12? 35? ? ? 6? ? i
9? 2016-01-09? 11? 7? ? 38? ? e
10 2016-01-10? ? 1? 14? ? 21? ? b
11 2016-01-11? ? 3? 20? ? 42? ? w
12 2016-01-12? 14? 1? ? 32? ? t
13 2016-01-13? 23? 19? ? 52? ? h
14 2016-01-14? 21? 41? ? 26? ? s
15 2016-01-15? ? 8? 16? ? 25? ? o
現(xiàn)在,我們需要把date航棱,hour睡雇,min和second列合并為新列datetime。通常丧诺,R中的日期時(shí)間格式為"Year-Month-Day-Hour:Min:Second"入桂。
dataNew <- data %>%
unite(datehour, date, hour, sep = ' ') %>%
unite(datetime, datehour, min, second, sep = ':')
dataNew
datetime event
1? 2016-01-01 7:30:29? ? u
2? 2016-01-02 9:43:36? ? a
3? 2016-01-03 13:58:60? ? l
4? 2016-01-04 20:22:11? ? q
5? 2016-01-05 5:44:47? ? p
6? 2016-01-06 18:52:37? ? k
7? 2016-01-07 19:12:43? ? r
8? 2016-01-08 12:35:6? ? i
9? 2016-01-09 11:7:38? ? e
10? 2016-01-10 1:14:21? ? b
11? 2016-01-11 3:20:42? ? w
12? 2016-01-12 14:1:32? ? t
13 2016-01-13 23:19:52? ? h
14 2016-01-14 21:41:26? ? s
15? 2016-01-15 8:16:25? ? o
separate
separate的調(diào)用格式為:
separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE,
convert = FALSE, extra = "warn", fill = "warn", ...)
我們可以用separate函數(shù)將數(shù)據(jù)恢復(fù)到剛創(chuàng)建的時(shí)候奄薇,如下所示:
data1 <- dataNew %>%
separate(datetime, c('date', 'time'), sep = ' ') %>%
separate(time, c('hour', 'min', 'second'), sep = ':')
data1
date hour min second event
1? 2016-01-01? 07? 30? ? 29? ? u
2? 2016-01-02? 09? 43? ? 36? ? a
3? 2016-01-03? 13? 59? ? 00? ? l
4? 2016-01-04? 20? 22? ? 11? ? q
5? 2016-01-05? 05? 44? ? 47? ? p
6? 2016-01-06? 18? 52? ? 37? ? k
7? 2016-01-07? 19? 12? ? 43? ? r
8? 2016-01-08? 12? 35? ? 06? ? i
9? 2016-01-09? 11? 07? ? 38? ? e
10 2016-01-10? 01? 14? ? 21? ? b
11 2016-01-11? 03? 20? ? 42? ? w
12 2016-01-12? 14? 01? ? 32? ? t
13 2016-01-13? 23? 19? ? 52? ? h
14 2016-01-14? 21? 41? ? 26? ? s
15 2016-01-15? 08? 16? ? 25? ? o
首先驳阎,將datetime分為date列和time列。然后馁蒂,將time列分為hour呵晚,min,second列沫屡。