1扬卷、對時(shí)間序列的處理:
In [193]: from datetime import datetime
In [194]: dates = [
...: datetime(2000, 1, 1),
...: datetime(2000, 1, 2),
...: datetime(2000, 1, 3)
...: ]
In [195]: dates # datetime 列表
Out[195]:
[datetime.datetime(2000, 1, 1, 0, 0),
datetime.datetime(2000, 1, 2, 0, 0),
datetime.datetime(2000, 1, 3, 0, 0)]
In [196]: s = Series(np.random.randn(3), index=dates)
In [197]: s # 時(shí)間戳作為索引的一維數(shù)組
Out[197]:
2000-01-01 0.536546
2000-01-02 0.226604
2000-01-03 0.487324
dtype: float64
In [198]: s.index # 時(shí)間戳索引
Out[198]: DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03'],
dtype='datetime64[ns]', freq=None)
In [199]: s.index[0] # 時(shí)間戳( datetime 數(shù)據(jù)類型作索引后扩然,自動(dòng)轉(zhuǎn)換為時(shí)間戳類型 )
Out[199]: Timestamp('2000-01-01 00:00:00')
2掸犬、pandas 的 Timestamp
方法生成時(shí)間戳:
In [64]: pd.Timestamp('2011/1/1')
Out[64]: Timestamp('2011-01-01 00:00:00')
In [65]: pd.Timestamp('2011-1-1')
Out[65]: Timestamp('2011-01-01 00:00:00')
In [66]: pd.Timestamp(2012,1,2)
Out[66]: Timestamp('2012-01-02 00:00:00')
In [67]: pd.Timestamp('1999-2-2 11:22:33')
Out[67]: Timestamp('1999-02-02 11:22:33')
3珊楼、pandas 的 to_datetime
方法生成時(shí)間戳索引:
# 格式很靈活
In [74]: pd.to_datetime(['1999/1/1', '1999-2-2', '2222-3-4 12:34:56'])
Out[74]:
DatetimeIndex(['1999-01-01 00:00:00', '1999-02-02 00:00:00',
'2222-03-04 12:34:56'],
dtype='datetime64[ns]', freq=None)
In [75]: pd.to_datetime(['Dec 23, 2011', '1-2-1999', None])
Out[75]: DatetimeIndex(['2011-12-23', '1999-01-02', 'NaT'],
dtype='datetime64[ns]', freq=None)
# 這是歐洲風(fēng)格通殃,把第一個(gè)數(shù)當(dāng)作日,第二個(gè)數(shù)當(dāng)作月
In [76]: pd.to_datetime(['1-2-1999'], dayfirst=True)
Out[76]: DatetimeIndex(['1999-02-01'], dtype='datetime64[ns]', freq=None)
# pd.to_datetime(Series/DataFrame) 返回值是 Series 數(shù)據(jù)類型厕宗,不是時(shí)間戳索引
# 注意画舌,這個(gè)返回值的每個(gè)元素的數(shù)據(jù)類型仍然是 Timestamp
In [96]: s = Series(['2011', '2012-3-4'])
In [97]: pd.to_datetime(s)
Out[97]:
0 2011-01-01
1 2012-03-04
dtype: datetime64[ns]
# year month day 這三項(xiàng)必須有
In [98]: df = DataFrame({
...: 'year': [2011, 1987],
...: 'month': [1, 2],
...: 'day': [3, 4],
...: 'hour': [5, 6]
...: })
In [99]: pd.to_datetime(df)
Out[99]:
0 2011-01-03 05:00:00
1 1987-02-04 06:00:00
dtype: datetime64[ns]
In [100]: type(pd.to_datetime(df))
Out[100]: pandas.core.series.Series
In [125]: type(pd.to_datetime(df)[0])
Out[125]: pandas._libs.tslibs.timestamps.Timestamp
4、pandas 的 date_range
方法生成時(shí)間戳索引:
# 三個(gè)參數(shù)依次為:開始時(shí)間已慢、結(jié)束時(shí)間曲聂、頻率,默認(rèn)時(shí)間戳?xí)r刻為每月最后一天零時(shí)零分零秒
# Q 表示每季度蛇受,M 表示每月句葵,D 表示每天厕鹃,H 表示每小時(shí)兢仰,T/MIN 表示每分鐘乍丈,S 表示每秒
# MS 表示每月第一天,BM 表示每月最后一天
# 5M 表示 5 個(gè)月把将,1h30min 表示 1 小時(shí) 30 分鐘
# 第三個(gè)參數(shù) freq 可以不寫轻专,默認(rèn)頻率是 D
In [201]: pd.date_range('1999-1-1', '2000', freq='M')
Out[201]:
DatetimeIndex(['1999-01-31', '1999-02-28', '1999-03-31', '1999-04-30',
'1999-05-31', '1999-06-30', '1999-07-31', '1999-08-31',
'1999-09-30', '1999-10-31', '1999-11-30', '1999-12-31'],
dtype='datetime64[ns]', freq='M')
In [202]: pd.date_range('1999-1-1', '2000', freq='MS')
Out[202]:
DatetimeIndex(['1999-01-01', '1999-02-01', '1999-03-01', '1999-04-01',
'1999-05-01', '1999-06-01', '1999-07-01', '1999-08-01',
'1999-09-01', '1999-10-01', '1999-11-01', '1999-12-01',
'2000-01-01'],
dtype='datetime64[ns]', freq='MS')
# 兩個(gè)參數(shù)為:開始時(shí)間、數(shù)量
# periods 表示生成多少個(gè)時(shí)間戳察蹲,默認(rèn)頻率為 D
In [226]: pd.date_range('1999.11.1', periods=3)
Out[226]: DatetimeIndex(['1999-11-01', '1999-11-02', '1999-11-03'],
dtype='datetime64[ns]', freq='D')
# 三個(gè)參數(shù)依次為:開始時(shí)間请垛、數(shù)量、頻率
In [227]: pd.date_range('1999.11.1', periods=3, freq='M')
Out[227]: DatetimeIndex(['1999-11-30', '1999-12-31', '2000-01-31'],
dtype='datetime64[ns]', freq='M')
5洽议、時(shí)間戳作為索引的 Series 數(shù)組用 resample
方法統(tǒng)計(jì)數(shù)據(jù)
resample 就是“重采樣”的意思
In [229]: dates = pd.date_range('1999.11.1', periods=9, freq='10D')
In [230]: s = Series(np.arange(1, len(dates)+1), index=dates)
In [231]: s
Out[231]:
1999-11-01 1
1999-11-11 2
1999-11-21 3
1999-12-01 4
1999-12-11 5
1999-12-21 6
1999-12-31 7
2000-01-10 8
2000-01-20 9
Freq: 10D, dtype: int64
In [232]: s.resample('M').sum() # 按月重采樣求和
Out[232]:
1999-11-30 6
1999-12-31 22
2000-01-31 17
Freq: M, dtype: int64
In [234]: s.resample('M').mean() # 按月重采樣求平均值
Out[234]:
1999-11-30 2.0
1999-12-31 5.5
2000-01-31 8.5
Freq: M, dtype: float64
# 按月重采樣求平均值后再按天重采樣求平均值
In [235]: s.resample('M').mean().resample('D').mean()
Out[235]:
1999-11-30 2.0
1999-12-01 NaN
1999-12-02 NaN
1999-12-03 NaN
...
2000-01-29 NaN
2000-01-30 NaN
2000-01-31 8.5
Freq: D, Length: 63, dtype: float64
# 同上操作用 ffill 處理缺失值 NaN
In [236]: s.resample('M').mean().resample('D').mean().ffill()
Out[236]:
1999-11-30 2.0
1999-12-01 2.0
1999-12-02 2.0
...
2000-01-29 5.5
2000-01-30 5.5
2000-01-31 8.5
Freq: D, Length: 63, dtype: float64
In [269]: s
Out[269]:
2011-01-01 0
2011-01-11 1
2011-01-21 2
2011-01-31 3
2011-02-10 4
2011-02-20 5
2011-03-02 6
2011-03-12 7
2011-03-22 8
2011-04-01 9
Freq: 10D, dtype: int64
# 按照月份進(jìn)行降采樣宗收,并將每月的數(shù)據(jù)的原值、最大值亚兄、最小值混稽、以及臨近值列出
# ohlc:open high low close
In [270]: s.resample('m').ohlc()
Out[270]:
open high low close
2011-01-31 0 3 0 3
2011-02-28 4 5 4 5
2011-03-31 6 8 6 8
2011-04-30 9 9 9 9
6、與時(shí)間戳對應(yīng)的時(shí)間間隔和時(shí)間間隔索引审胚,時(shí)間間隔就是時(shí)間段匈勋,使用 pd.Period
和 pd.period_range
方法:
# 生成時(shí)間間隔
In [45]: pd.Period('2011')
Out[45]: Period('2011', 'A-DEC')
In [46]: pd.Period('2011-1')
Out[46]: Period('2011-01', 'M')
# 生成時(shí)間間隔索引,參數(shù)與 date_range 類似
In [47]: pd.period_range('2011', '2012', freq='m')
Out[47]:
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05',
'2011-06', '2011-07', '2011-08', '2011-09', '2011-10',
'2011-11', '2011-12', '2012-01'],
dtype='period[M]', freq='M')
In [65]: pd.period_range('2011-11', periods=10, freq='M')
Out[65]:
PeriodIndex(['2011-11', '2011-12', '2012-01', '2012-02', '2012-03',
'2012-04', '2012-05', '2012-06', '2012-07', '2012-08'],
dtype='period[M]', freq='M')