在線讀書:
R for data science
github地址: https://github.com/hadley/r4ds
16. Dates and Times
library(lubridate)
16.2 Creating date/times
3種date/times格式:
date. Tibbles print this as
<date>
.time. Tibbles print this as
<time>
.date-time. is a date plus a time: (typically to the nearest second). Tibbles print this as
<dttm>
.
You should always use the simplest possible data type that works for your needs.
today() ## date of today
now() ## date-time of now
3 種需要使用date/time的途徑荔仁;
- From a string.
- From individual date-time components.
- From an existing date/time object.
16.2.1 From strings
通過y年戈钢,m月免都,d日的不同順序組合來創(chuàng)制
ymd("2017-01-31") ## 年月日
#> [1] "2017-01-31"
mdy("January 31st, 2017")## 月日年
#> [1] "2017-01-31"
dmy("31-Jan-2017") ## 日月年
#> [1] "2017-01-31"
ymd(20170131) ##也可以識別非字符串形式
也可以創(chuàng)制date-time
ymd_hms("2017-01-31 20:11:59")
#> [1] "2017-01-31 20:11:59 UTC"
mdy_hm("01/31/2017 08:01")
#> [1] "2017-01-31 08:01:00 UTC"
16.2.2 From individual components
對于分散在不同列中的組成部分使用
make_date()
for dates, or make_datetime()
for date-times:
flights %>%
select(year, month, day, hour, minute) %>%
mutate(departure = make_datetime(year, month, day, hour, minute))
#> # A tibble: 336,776 x 6
#> year month day hour minute departure
#> <int> <int> <int> <dbl> <dbl> <dttm>
#> 1 2013 1 1 5 15 2013-01-01 05:15:00
#> 2 2013 1 1 5 29 2013-01-01 05:29:00
#> 3 2013 1 1 5 40 2013-01-01 05:40:00
#> 4 2013 1 1 5 45 2013-01-01 05:45:00
#> 5 2013 1 1 6 0 2013-01-01 06:00:00
#> 6 2013 1 1 5 58 2013-01-01 05:58:00
#> # … with 3.368e+05 more rows
-date-times
in a numeric context (like in a histogram), 1 means 1 second, so a binwidth of 86400 means one day. For dates
, 1 means 1 day.
16.2.3 From other types
-
as_datetime()
#轉(zhuǎn)換date-time格式诉儒, -
as_date()
# 轉(zhuǎn)換為date格式剧罩。 - “Unix Epoch”時間衔瓮,是從Epoch(1970年1月1日00:00:00 UTC)開始所經(jīng)過的秒數(shù)见擦,不考慮[閏秒]沼头。在大多數(shù)的[UNIX]系統(tǒng)中UNIX時間戳存儲為32位,這樣會引發(fā)2038年問題或Y2038解藻。If the offset is in seconds, use as_datetime(); if it’s in days, use as_date().
as_datetime(today())
#> [1] "2019-01-08 UTC"
as_date(now())
#> [1] "2019-01-08"
as_datetime(60 * 60 * 10)
#> [1] "1970-01-01 10:00:00 UTC"
as_date(365 * 10 + 2)
#> [1] "1980-01-01"
16.3 Date-time components
- year() , month(), mday() (day of the month), yday() (day of the year), wday() (day of the week), hour(), minute(), and second().
-
month()
andwday()
可以設(shè)置label = TRUE
參數(shù)老充,顯示月份或者星期幾的縮寫。設(shè)置abbr = FALSE
參數(shù)可以顯示全稱螟左。
16.3.2 Rounding 近似
- floor_date() #round down (floor)往前
- round_date() # 四舍五入
- ceiling_date()# round up (ceiling)靠后
16.3.3 Setting components
- 使用
<-
對date/time 進(jìn)行更改 -
update()
# 對多個部分進(jìn)行修改啡浊。update(datetime, year = 2020, month = 2, mday = 2, hour = 2)
- If values are too big, they will roll-over
16.4 Time spans
- durations, 持續(xù)時間.
- periods, 周期.
- intervals, 起止時間.
16.4.1 Durations
- 兩個時間相減 ## 產(chǎn)生的格式為difftime
- as.duration()###轉(zhuǎn)換為duration格式。始終以秒為單位計算持續(xù)時間巷嚣,Larger units are created by converting minutes, hours, days, weeks, and years to seconds at the standard rate (60 seconds in a minute, 60 minutes in an hour, 24 hours in day, 7 days in a week, 365 days in a year).
- dseconds() ##通過函數(shù)創(chuàng)制Duration 格式的時間,形式為dxxxs()
dminutes() ##
dhours()
ddays()
dweeks()
dyears() - 時間間隔可以相加或相減 ##如果遇到閏年廷粒、夏令時等問題窘拯,結(jié)果會不同坝茎。
16.4.2 Periods 周期
以下函數(shù)將創(chuàng)制周期為單位的數(shù)據(jù)树枫,同類型數(shù)據(jù)可以相加減:
days() # 1天
seconds(15) ## 15s
minutes(10) ## 10min
hours(12) ## 12 hour
months()## 月
weeks() ## 周
years()## 年
16.4.3 Intervals
-
today() %--% next_year
### 以"%--%" 隔開兩個時間點,得到Interval 格式的數(shù)據(jù)景东。 - pick the simplest data structure that solves your problem.
- If you only care about physical time, use a duration;
- if you need to add human times, use a period;
- if you need to figure out how long a span is in human units, use an interval.
image.png
Figure 16.1 不同時間格式的計算操作規(guī)律.
16.5 Time zones
-
Sys.timezone()
# 查看系統(tǒng)時區(qū) - lubridate always uses UTC. UTC (Coordinated Universal Time) is the standard time zone used by the scientific community and roughly equivalent to its predecessor GMT (Greenwich Mean Time).調(diào)世界時是以原子時秒長為基礎(chǔ)砂轻,在時刻上盡量接近于世界時的一種時間計量系統(tǒng)斤吐。
- GMT(Greenwich Mean Time)——格林尼治標(biāo)準(zhǔn)時間
- It does not have DST, which makes a convenient representation for computation. DTS是Daylight Saving Time的縮寫,稱為陽光節(jié)約時,在我國稱為夏時制,又稱夏令時,是一種為節(jié)約能源而人為調(diào)整地方時間的制度搔涝。
- Operations that combine date-times, like c(), will often drop the time zone.
16.3.4 Exercises
- How does the distribution of flight times within a day change over the course of the year?
flights_dt %>%
mutate(dep_time=update(dep_time,year = 2020, month = 2, mday = 2)) %>%
ggplot(aes(dep_time))+geom_freqpoly(binwidth = 3600)
- Compare dep_time, sched_dep_time and dep_delay. Are they consistent? Explain your findings.
flights_dt %>%
mutate(delay=(dep_time-sched_dep_time)) %>%
select(tailnum,dep_time,sched_dep_time,dep_delay,delay)
- Compare air_time with the duration between the departure and arrival. Explain your findings. (Hint: consider the location of the airport.)
flights %>% select(air_time,distance) %>%
ggplot(aes(distance,air_time))+geom_point()
- How does the average delay time change over the course of a day? Should you use dep_time or sched_dep_time? Why?
##
flights_dt %>% mutate(dep_time=update(dep_time,year=2013,month=1,mday=1))%>%
group_by(dep_time) %>%
summarise(mean=mean(dep_delay)) %>%
ggplot(aes(dep_time,mean))+geom_line()
flights_dt %>% mutate(sched_dep_time=update(sched_dep_time,year=2013,month=1,mday=1))%>%
group_by(sched_dep_time) %>%
summarise(mean_delay=mean(dep_delay)) %>%
ggplot(aes(sched_dep_time,mean_delay))+geom_point()+geom_smooth()
- On what day of the week should you leave if you want to minimise the chance of a delay?
flights_dt %>% mutate(weekday=wday(sched_dep_time)) %>%
group_by(weekday) %>%
summarise(mean=mean(dep_delay)) %>%
ggplot(aes(weekday,mean))+geom_line()
- What makes the distribution of diamonds$carat and flights$sched_dep_time similar?
by human judgement
ggplot(diamonds,aes(carat))+geom_freqpoly()
sched_dep <- flights_dt %>%
mutate(minute = minute(sched_dep_time)) %>%
group_by(minute) %>%
summarise(
avg_delay = mean(arr_delay, na.rm = TRUE),
n = n())
ggplot(sched_dep, aes(minute, n)) +
geom_line()
- Confirm my hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early. Hint: create a binary variable that tells you whether or not a flight was delayed.
flights_dt %>% mutate(minute=minute(sched_dep_time),is=dep_delay<0) %>%
group_by(minute) %>%
summarise(ave_delay=mean(is),n=sum(is)/n()) %>%
ggplot(aes(minute,n))+geom_line()