股票的因素很多,如何辨別該因素是有效因素葛躏,即某個因素對收益的影響。下面舉一個小例子來說明悠菜,某證券公司對5個地區(qū)的分公司的單日開戶數(shù)量進行分析舰攒,每個地區(qū)獲取10個營業(yè)部的數(shù)據(jù),得到如下資料:
D1 [126,124,120,92,125,142,29,26,123,29]
D2 [40,45,66,22,41,30,23,70,90,111]
D3 [10,11,13,11,9,8,13,11,6,7]
D4 [8,11,6,7,7,9,12,15,10,13]
D5 [7,6,8,8,13,6,10,7,5,9]
判斷5個讀取的單日開戶數(shù)量是否有顯著差異悔醋,取顯著性水平0.05
import matplotlib.pyplot as plt
from scipy import stats
import tushare as ts
import numpy as np
import statsmodels.api as sm
%matplotlib inline
import pandas as pd
import sys
from statsmodels.formula.api import ols
import statsmodels.stats.anova as anova
dw = pd.DataFrame(columns=['num','locate'])
d1 = pd.DataFrame({"num":[126,124,120,92,125,142,29,26,123,29],'locate':['D1']*10})
d2 = pd.DataFrame({"num":[40,45,66,22,41,30,23,70,90,111],'locate':['D2']*10})
d3 = pd.DataFrame({"num":[10,11,13,11,9,8,13,11,6,7],'locate':['D3']*10})
d4 = pd.DataFrame({"num":[8,11,6,7,7,9,12,15,10,13],'locate':['D4']*10})
d5 = pd.DataFrame({"num":[7,6,8,8,13,6,10,7,5,9],'locate':['D5']*10})
d = d1.append(d2).append(d3).append(d4).append(d5)
model = ols('num ~ C(locate)', data=d).fit()
tabel = anova.anova_lm(model)
print(tabel)
上述結(jié)果表明1.729214e-10在0.05的顯著水平下摩窃,p值遠遠小于0.05,故我們應(yīng)該拒絕原假設(shè)芬骄,認為不同開戶地點對開戶數(shù)量是不一樣的猾愿,因此我們的直覺得到了驗證,即開戶地點是影響開戶數(shù)量的一個重要因素账阻。