原創(chuàng)况鸣,轉(zhuǎn)載注明出處牢贸!
本教程目的:
提供一個(gè)ARMA方法預(yù)測時(shí)間序列的demo,可直接運(yùn)行懒闷,為初學(xué)者提供一個(gè)直觀的認(rèn)識十减。通過本教程你可以學(xué)會:
1、時(shí)間序列建姆吖溃基本步驟
2、時(shí)間序列相關(guān)畫圖操作
3速址、對時(shí)間序列預(yù)測有一個(gè)感性的認(rèn)識
4玩焰、ARMA預(yù)測是dynamic參數(shù)的影響通過本教程你還不能掌握:
1、ARMA模型詳細(xì)優(yōu)化
2芍锚、ARMA模型預(yù)測的內(nèi)部實(shí)現(xiàn)細(xì)節(jié)
話不多說昔园,直接上代碼
導(dǎo)入數(shù)據(jù)
from __future__ import print_function
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.graphics.api import qqplot
# 數(shù)據(jù)
dta=[-612,
277,
377,
-3266,
-950,
2336,
1459,
-829,
1191,
238,
-2965,
-1764,
-85,
5312,
3,
-1342,
1733,
-4106,
-1461,
3186,
-92,
411,
-650,
118,
-2676,
-469,
3051,
1122,
-329,
247,
866,
-2548,
-1414,
3125,
371,
274,
533,
-175,
-2332,
-1388,
3060,
1369,
676,
-806,
522,
-2199,
-2511,
3901,
-36,
920,
-1108,
2175,
-2333,
-1105,
3029,
-31,
2305,
1302,
2761,
-4775,
-3201,
7769,
-1214,
1817,
-5271,
971,
-2446,
-3705,
3329,
229,
1952,
-2434,
1130,
-3521,
-503,
5004,
-2211,
2046,
521,
-363,
-2723,
-2609,
4091,
1314,
1050,
-574,
-585,
-3632,
-659,
]
dta=pd.Series(dta)
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('2002','2090'))
dta.plot(figsize=(12,8))
plt.show()
畫ACF(自相關(guān)函數(shù))和PACF(偏自相關(guān)函數(shù))
這兩個(gè)函數(shù)用來經(jīng)驗(yàn)性的確定ARMA(p, q)中的p和q
# plot acf and pacf
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
sm.graphics.tsa.plot_acf(dta.values,lags=40,ax=ax1) # lags 表示滯后的階數(shù)
ax2 = fig.add_subplot(212)
sm.graphics.tsa.plot_pacf(dta.values,lags=40,ax=ax2)
plt.show()
建模
# 模型1
arma_mod70 = sm.tsa.ARMA(dta, (7, 0)).fit(disp=False)
print(arma_mod70.aic, arma_mod70.bic, arma_mod70.hqic)
# 模型2
arma_mod01 = sm.tsa.ARMA(dta, (0, 1)).fit(disp=False)
print(arma_mod01.aic, arma_mod01.bic, arma_mod01.hqic)
# 模型3
arma_mod71 = sm.tsa.ARMA(dta, (7, 1)).fit(disp=False)
print(arma_mod71.aic, arma_mod71.bic, arma_mod71.hqic)
# 模型4
arma_mod80 = sm.tsa.ARMA(dta, (8, 0)).fit(disp=False)
print(arma_mod80.aic, arma_mod80.bic, arma_mod80.hqic)
# 模型5(只是為了示范,p和q隨便取的)
arma_mod32 = sm.tsa.ARMA(dta, (3, 2)).fit(disp=False)
print(arma_mod32.aic, arma_mod32.bic, arma_mod32.hqic)
# 最佳模型可以取aic最小的一個(gè)
輸出為:
1579.7025547891606 1602.10028211675 1588.7304359212178
1632.3203733166176 1639.786282425814 1635.3296670273035
1581.0916055987627 1605.9779692960842 1591.122584634382
1581.3957835886288 1606.2821472859503 1591.426762624248
模型檢驗(yàn)
# 模型檢驗(yàn)
# 在指數(shù)平滑模型下并炮,觀察ARIMA模型的殘差是否是平均值為0且方差為常數(shù)的正態(tài)分布(服從零均值默刚、方差不變的正態(tài)分布),
# 同時(shí)也要觀察連續(xù)殘差是否(自)相關(guān)逃魄。
# 對ARMA(7,0)模型所產(chǎn)生的殘差做自相關(guān)圖
resid = arma_mod70.resid
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid.values.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid, lags=40, ax=ax2)
plt.show()
觀察殘差是否符合正態(tài)分布
# 觀察殘差是否符合正態(tài)分布
# 這里使用QQ圖荤西,它用于直觀驗(yàn)證一組數(shù)據(jù)是否來自某個(gè)分布,或者驗(yàn)證某兩組數(shù)據(jù)是否來自同一(族)分布。
# 在教學(xué)和軟件中常用的是檢驗(yàn)數(shù)據(jù)是否來自于正態(tài)分布邪锌。
resid = arma_mod70.resid # 殘差
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid, line='q', ax=ax, fit=True)
plt.show()
# Ljung-Box檢驗(yàn)
# 略
模型預(yù)測-arma_mod70模型畫圖
模型確定之后勉躺,就可以開始進(jìn)行預(yù)測了。
包含訓(xùn)練數(shù)據(jù)一起預(yù)測觅丰,dynamic=True:
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod70.plot_predict('2009', '2250', dynamic=True, ax=ax, plot_insample=False, alpha=0.05)
plt.show()
print(arma_mod70.summary())
ARMA Model Results
==============================================================================
Dep. Variable: y No. Observations: 89
Model: ARMA(7, 0) Log Likelihood -780.851
Method: css-mle S.D. of innovations 1524.852
Date: Tue, 27 Aug 2019 AIC 1579.703
Time: 16:10:20 BIC 1602.100
Sample: 12-31-2002 HQIC 1588.730
- 12-31-2090
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 42.5366 72.269 0.589 0.558 -99.108 184.181
ar.L1.y -0.2975 0.093 -3.213 0.002 -0.479 -0.116
ar.L2.y -0.3757 0.094 -3.976 0.000 -0.561 -0.191
ar.L3.y -0.2901 0.100 -2.889 0.005 -0.487 -0.093
ar.L4.y -0.3143 0.099 -3.169 0.002 -0.509 -0.120
ar.L5.y -0.2210 0.101 -2.192 0.031 -0.419 -0.023
ar.L6.y -0.2347 0.096 -2.451 0.016 -0.422 -0.047
ar.L7.y 0.4670 0.093 5.024 0.000 0.285 0.649
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 -0.9414 -0.4833j 1.0583 -0.4245
AR.2 -0.9414 +0.4833j 1.0583 0.4245
AR.3 -0.2297 -1.0284j 1.0537 -0.2850
AR.4 -0.2297 +1.0284j 1.0537 0.2850
AR.5 0.6358 -0.8308j 1.0461 -0.1460
AR.6 0.6358 +0.8308j 1.0461 0.1460
AR.7 1.5733 -0.0000j 1.5733 -0.0000
-----------------------------------------------------------------------------
包含訓(xùn)練數(shù)據(jù)一起預(yù)測饵溅,dynamic=False:
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod70.plot_predict('2009', '2250', dynamic=False, ax=ax, plot_insample=False, alpha=0.05)
plt.show()
注意:
dynamic
參數(shù)只影響in-sample(可以理解為訓(xùn)練數(shù)據(jù),或內(nèi)插)的預(yù)測妇萄,不影響out-of-sample(可以理解為測試數(shù)據(jù)蜕企,或外插)的預(yù)測。如果預(yù)測的數(shù)據(jù)全部為out-of-sample類型冠句,則dynamic
參數(shù)不會影響預(yù)測結(jié)果糖赔。如果預(yù)測數(shù)據(jù)含有in-sample類型(如上面兩圖,預(yù)測是從2009開始的)轩端,則dynamic
參數(shù)影響預(yù)測結(jié)果放典。
在 arima_model.py
中的解釋為
"""
...
dynamic : bool, optional
The `dynamic` keyword affects in-sample prediction. If dynamic
is False, then the in-sample lagged values are used for
prediction. If `dynamic` is True, then in-sample forecasts are
used in place of lagged dependent variables. The first forecasted
value is `start`.
...
"""
另外從上面兩圖可以看出,當(dāng)時(shí)間向后推移時(shí)基茵,預(yù)測的值將趨于收斂(一個(gè)常數(shù))奋构,所以ARMA或ARIMA等方法只適用于短時(shí)間的預(yù)測,不適用于長時(shí)間的預(yù)測拱层!
arma_mod32模型畫圖
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2091', '2100', dynamic=True, ax=ax, plot_insample=False)
plt.show()
print(arma_mod32.summary())
ARMA Model Results
==============================================================================
Dep. Variable: y No. Observations: 89
Model: ARMA(3, 2) Log Likelihood -804.123
Method: css-mle S.D. of innovations 2016.787
Date: Tue, 27 Aug 2019 AIC 1622.245
Time: 15:02:58 BIC 1639.666
Sample: 12-31-2002 HQIC 1629.267
- 12-31-2090
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 44.5054 46.003 0.967 0.336 -45.660 134.670
ar.L1.y -0.3571 0.162 -2.209 0.030 -0.674 -0.040
ar.L2.y 0.2441 0.181 1.349 0.181 -0.111 0.599
ar.L3.y -0.0450 0.135 -0.334 0.739 -0.309 0.219
ma.L1.y 0.0228 0.130 0.175 0.862 -0.233 0.278
ma.L2.y -0.8108 0.109 -7.439 0.000 -1.024 -0.597
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 -1.3198 -0.0000j 1.3198 -0.5000
AR.2 3.3714 -2.3381j 4.1028 -0.0965
AR.3 3.3714 +2.3381j 4.1028 0.0965
MA.1 -1.0966 +0.0000j 1.0966 0.5000
MA.2 1.1247 +0.0000j 1.1247 0.0000
-----------------------------------------------------------------------------
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2002', '2100', dynamic=False, ax=ax, plot_insample=False)
plt.show()
完整代碼如下
from __future__ import print_function
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.graphics.api import qqplot
# 數(shù)據(jù)
dta=[-612,
277,
377,
-3266,
-950,
2336,
1459,
-829,
1191,
238,
-2965,
-1764,
-85,
5312,
3,
-1342,
1733,
-4106,
-1461,
3186,
-92,
411,
-650,
118,
-2676,
-469,
3051,
1122,
-329,
247,
866,
-2548,
-1414,
3125,
371,
274,
533,
-175,
-2332,
-1388,
3060,
1369,
676,
-806,
522,
-2199,
-2511,
3901,
-36,
920,
-1108,
2175,
-2333,
-1105,
3029,
-31,
2305,
1302,
2761,
-4775,
-3201,
7769,
-1214,
1817,
-5271,
971,
-2446,
-3705,
3329,
229,
1952,
-2434,
1130,
-3521,
-503,
5004,
-2211,
2046,
521,
-363,
-2723,
-2609,
4091,
1314,
1050,
-574,
-585,
-3632,
-659,
]
dta=pd.Series(dta)
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('2002','2090'))
dta.plot(figsize=(12,8))
plt.show()
# plot acf and pacf
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
sm.graphics.tsa.plot_acf(dta.values,lags=40,ax=ax1) # lags 表示滯后的階數(shù)
ax2 = fig.add_subplot(212)
sm.graphics.tsa.plot_pacf(dta.values,lags=40,ax=ax2)
plt.show()
# 模型1
arma_mod70 = sm.tsa.ARMA(dta, (7, 0)).fit(disp=False)
print(arma_mod70.aic, arma_mod70.bic, arma_mod70.hqic)
# 模型2
arma_mod01 = sm.tsa.ARMA(dta, (0, 1)).fit(disp=False)
print(arma_mod01.aic, arma_mod01.bic, arma_mod01.hqic)
# 模型3
arma_mod71 = sm.tsa.ARMA(dta, (7, 1)).fit(disp=False)
print(arma_mod71.aic, arma_mod71.bic, arma_mod71.hqic)
# 模型4
arma_mod80 = sm.tsa.ARMA(dta, (8, 0)).fit(disp=False)
print(arma_mod80.aic, arma_mod80.bic, arma_mod80.hqic)
# 模型5(只是為了示范弥臼,p和q隨便取的)
arma_mod32 = sm.tsa.ARMA(dta, (3, 2)).fit(disp=False)
print(arma_mod32.aic, arma_mod32.bic, arma_mod32.hqic)
# 最佳模型可以取aic最小的一個(gè)
# 模型檢驗(yàn)
# 在指數(shù)平滑模型下,觀察ARIMA模型的殘差是否是平均值為0且方差為常數(shù)的正態(tài)分布(服從零均值根灯、方差不變的正態(tài)分布)径缅,
# 同時(shí)也要觀察連續(xù)殘差是否(自)相關(guān)。
# 對ARMA(7,0)模型所產(chǎn)生的殘差做自相關(guān)圖
resid = arma_mod70.resid
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid.values.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid, lags=40, ax=ax2)
plt.show()
# 觀察殘差是否符合正態(tài)分布
# 這里使用QQ圖烙肺,它用于直觀驗(yàn)證一組數(shù)據(jù)是否來自某個(gè)分布纳猪,或者驗(yàn)證某兩組數(shù)據(jù)是否來自同一(族)分布。
# 在教學(xué)和軟件中常用的是檢驗(yàn)數(shù)據(jù)是否來自于正態(tài)分布桃笙。
resid = arma_mod70.resid # 殘差
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid, line='q', ax=ax, fit=True)
plt.show()
# Ljung-Box檢驗(yàn)
# 略
# 模型預(yù)測
# 模型確定之后氏堤,就可以開始進(jìn)行預(yù)測了,我們對未來十年的數(shù)據(jù)進(jìn)行預(yù)測搏明。
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod70.plot_predict('2009', '2250', dynamic=True, ax=ax, plot_insample=False, alpha=0.05)
plt.show()
print(arma_mod70.summary())
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod70.plot_predict('2009', '2250', dynamic=False, ax=ax, plot_insample=False, alpha=0.05)
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2091', '2100', dynamic=True, ax=ax, plot_insample=False)
print(arma_mod32.summary())
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2091', '2100', dynamic=False, ax=ax, plot_insample=False)
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2002', '2100', dynamic=False, ax=ax, plot_insample=False)
plt.show()
歡迎留言鼠锈、指正、吐槽星著。购笆。。