用支持向量雞預測股票價格走勢

向量雞怒啄Navi

在2018年CSGO倫敦Major比賽中，主辦方別出心裁請來一位專業(yè)人員預測比賽結果——圖中的這只小雞。它通過選擇帶有隊標的米盒進行預測。
這只小雞不負眾望慌盯，淘汰賽階段的所有比賽全部預測正確，包括決賽掂器。
只不過亚皂，它選擇的隊伍全都輸?shù)袅吮荣悾瀼亓恕白恼l誰死”的宗旨国瓮，反向預測拉滿灭必。
有網(wǎng)友表示，人家就是在選輸?shù)哪沁吥四。祟惱斫夥戳硕选?/p>

這種用雞做預測的技術是目前非常先進的人工智能技術的一種禁漓，即支持向量雞播歼。
支持向量雞非常善于解決N個類別的分類問題秘狞，具體方法為磷支，在雞的面前擺上N個完全一樣的米盒，放上等量的大米廓潜，觀察雞先啄哪一個辩蛋，即可得到待預測問題的分類移盆。
同樣咒循，支持向量雞也可以用于預測股票價格走勢。具體方法為颖医，用2個米盒分別代表“漲”熔萧、“跌”，觀察向量雞先啄哪一盒即可贮缕。
（本段內(nèi)容純屬胡說八道）

下面正經(jīng)介紹通過python中支持向量機（svm）預測指數(shù)漲跌的簡單應用感昼，數(shù)據(jù)來源依然是tushare肋演。
首先是支持向量機的概念：

在機器學習中爹殊，支持向量機（英語：support vector machine奸绷，常簡稱為SVM，又名支持向量網(wǎng)絡）是在分類與回歸分析中分析數(shù)據(jù)的監(jiān)督式學習模型與相關的學習算法反症。給定一組訓練實例铅碍，每個訓練實例被標記為屬于兩個類別中的一個或另一個线椰，SVM訓練算法創(chuàng)建一個將新的實例分配給兩個類別之一的模型憨愉，使其成為非概率二元線性分類器。SVM模型是將實例表示為空間中的點径密，這樣映射就使得單獨類別的實例被盡可能寬的明顯的間隔分開躺孝。然后享扔，將新的實例映射到同一空間，并基于它們落在間隔的哪一側來預測所屬類別植袍。
除了進行線性分類之外惧眠，SVM還可以使用所謂的核技巧有效地進行非線性分類，將其輸入隱式映射到高維特征空間中奋单。（來源：wiki）

支持向量機分類思想

略過復雜的數(shù)學公式锉试，一言以蔽之：使分類超平面與最近的樣本點之間的距離最大化。
最早用于刻畫超平面的函數(shù)是線性的，1992年Bernhard E. Boser呆盖、Isabelle M. Guyon和弗拉基米爾·萬普尼克提出了一種通過將核技巧（最初由Aizerman et al.提出）應用于最大邊界超平面來創(chuàng)建非線性分類器的方法拖云。所得到的算法形式上類似，除了把點積換成了非線性核函數(shù)宙项。

常見核函數(shù)

解法上株扛，簡單說來，計算SVM分類器可以歸結為一個目標函數(shù)可微的約束優(yōu)化問題洞就。
傳統(tǒng)的解法是通過拉格朗日對偶，得到進一步簡化旬蟋。由于拉格朗日對偶大大減少了計算量油昂，現(xiàn)有的算法改進基本都在圍繞對偶問題做文章。
目前先進的算法包括次梯度下降和坐標下降倾贰。當處理大的稀疏數(shù)據(jù)集時冕碟，這兩種技術已經(jīng)被證明有著顯著的優(yōu)點——當存在許多訓練實例時次梯度法是特別有效的，并且當特征空間的維度高時匆浙，坐標下降特別有效安寺。而坐標下降本質上也是基于對偶問題。
具體公式可以通過網(wǎng)絡搜索首尼，不再贅述。

本文思路分3步：
（1）獲取數(shù)據(jù)和處理：以股票漲跌幅(pct_ch)和前一日收盤價（pre_close）的N日MA作為輸入破加，以股票當日漲跌的0-1分類作為輸出了罪，其中1代表漲，0代表跌蛾茉。

def get_data(ma):
    fig, ax = plt.subplots()
    #tushare獲取數(shù)據(jù)
    df = ts.pro_bar(pro_api=api, ts_code='600887.SH', adj='qfq', start_date='20180101', end_date='20190318')
    df.index = pd.to_datetime(df.trade_date)
    #按日期排序
    df = df.sort_values(by = 'trade_date')
    #輸入?yún)?shù)移動平均化
    dfMA = df.rolling(window=ma, center=False).mean()
    #只保留需要的參數(shù)
    df.drop(['ts_code', 'close', 'trade_date', 'open', 'high', 'low', 'pre_close', 'change', 'vol', 'amount'], axis=1, inplace=True)
    dfMA.drop(['ts_code', 'trade_date', 'open', 'high', 'low', 'close', 'change'], axis=1, inplace=True)
    #print(df.head())
    #print(dfMA.tail())
    #畫圖
    plt.plot(df)
    plt.plot(dfMA.pct_chg)
    #去除移動平均產(chǎn)生的前若干行的空值
    y = df[ma:].values
    x = dfMA[ma:].values
    #分類
    for i in range(len(y)):
        if y[i]>0:
            y[i]=1
        else:
            y[i]=0
    return x, y

（2）用train_test_split分組节沦，手動選擇參數(shù)訓練支持向量機吼鳞，進行預測常熙；或者像這里用GridSearchCV學習下如何便捷的調參。

#建議采用GridSearchCV調參，否則需手動調參
#x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1, train_size=0.8)
#clf = svm.SVC(C=0.5, kernel='linear', decision_function_shape='ovr')  # 線性
#clf = svm.SVC(C=0.4, kernel='rbf', gamma=20, decision_function_shape='ovr') #非線性核函數(shù)

#測試集
x_train = x[:-1]
x_test = x[-1],
y_train = y[:-1]
y_test = y[-1]

#GridSearchCV自動調參，這里選擇了線性函數(shù)和徑向基函數(shù)（rbf）
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 2, 4], 'gamma':[0.125, 0.25, 0.5 ,1, 2, 4]}
svr = svm.SVC()
print(x, y.ravel())
clf = GridSearchCV(svr, parameters)
clf.fit(x, y.ravel())
print('GridSearchCVing...')
cv_result = pd.DataFrame.from_dict(clf.cv_results_)
print(cv_result)
print('best', clf.best_params_)

#手動調參擬合
#clf.fit(x_train, y_train.ravel())

#顯示精度
print(clf.score(x_train, y_train))
y_hat = clf.predict(x_train)
show_accuracy(y_hat, y_train, '訓練集')
print(clf.score(x_test, y_test))
y_hat = clf.predict(x_test)
show_accuracy(y_hat, y_test, '測試集')

# 輸出決策方程和預測值
print('decision_function:\n', clf.decision_function(x_train))
print('\npredict:\n', clf.predict(x_train))

x1_min, x1_max = x[:, 0].min(), x[:, 0].max()  # 第0列的范圍
x2_min, x2_max = x[:, 1].min(), x[:, 1].max()  # 第1列的范圍
x1, x2  = np.mgrid[x1_min:x1_max:200j, x2_min:x2_max:200j]  # 生成網(wǎng)格采樣點
grid_test = np.stack((x1.flat, x2.flat), axis=1)  # 測試點

（3）畫圖：這里以伊利股份（600887）為例啥容。

收盤價及前日收盤價MA40

分類器結果

數(shù)據(jù)范圍（start_date='20080101', end_date='20190318'）咪惠，可見GridSearchCV為我們采用了線性分類器，認為該股票前一日漲幅MA40大于0.13%的情況下淋淀，當日看漲遥昧，與股票價格關系不大。

預測準確率

分類器訓練集預測準確率53.08%，測試集預測準確率反而高一些達54.06%炭臭〗形冢總體上并不算高，還有優(yōu)化挖掘空間徽缚。
運行時間22s左右憨奸。

結語：svm還可以用于預測股價走勢的反轉，當svm連續(xù)失效的時候可能就是反轉到來的信號凿试。另外二維條件下有斜率的分類器方程排宰，從圖中也可看出有趣的含義。高維度情形值得進一步挖掘那婉，但算力是個問題板甘。
完整代碼如下：

# -*- coding: utf-8 -*-
from sklearn import svm
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn.model_selection import train_test_split
import tushare as ts
import pandas as pd
from sklearn.model_selection import GridSearchCV
import time

api = ts.pro_api('your token')
time_start=time.time()

mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False
pro = ts.pro_api()

def get_data(ma):
    fig, ax = plt.subplots()
    #tushare獲取數(shù)據(jù)
    df = ts.pro_bar(pro_api=api, ts_code='600887.SH', adj='qfq', start_date='20080101', end_date='20190318')
    df.index = pd.to_datetime(df.trade_date)
    #按日期排序
    df = df.sort_values(by = 'trade_date')
    #輸入?yún)?shù)移動平均化
    dfMA = df.rolling(window=ma, center=False).mean()
    plt.plot(df.close)
    plt.plot(dfMA.pre_close)
    #只保留需要的參數(shù)
    df.drop(['ts_code', 'close', 'trade_date', 'open', 'high', 'low', 'pre_close', 'change', 'vol', 'amount'], axis=1, inplace=True)
    dfMA.drop(['ts_code', 'trade_date', 'open', 'high', 'low', 'close', 'change'], axis=1, inplace=True)
    #print(df.head())
    #print(dfMA.tail())
    #去除移動平均產(chǎn)生的前若干行的空值
    y = df[ma:].values
    x = dfMA[ma:].values
    #分類
    for i in range(len(y)):
        if y[i]>0:
            y[i]=1
        else:
            y[i]=0
    return x, y

    #顯示預測準確度，可自己設置
def show_accuracy(y_hat, y_test, param):
    pass

ma=40
x, y = get_data(ma)
    #選擇預測輸入?yún)?shù)
x = x[:,:2]

#建議采用GridSearchCV調參详炬，否則需手動調參
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1, train_size=0.8)
#clf = svm.SVC(C=0.5, kernel='linear', decision_function_shape='ovr')  # 線性
#clf = svm.SVC(C=0.4, kernel='rbf', gamma=20, decision_function_shape='ovr') #非線性核函數(shù)

#測試集（用于預測1日）
'''
x_train = x[:-1]
x_test = x[-1],
y_train = y[:-1]
y_test = y[-1]
'''

#GridSearchCV自動調參
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 2, 4], 'gamma':[0.125, 0.25, 0.5 ,1, 2, 4]}
svr = svm.SVC()
print(x, y.ravel())
clf = GridSearchCV(svr, parameters)
clf.fit(x, y.ravel())
print('GridSearchCVing...')
cv_result = pd.DataFrame.from_dict(clf.cv_results_)
print(cv_result)
print('best', clf.best_params_)

#手動調參擬合
#clf.fit(x_train, y_train.ravel())

#顯示精度
print(clf.score(x_train, y_train))
y_hat = clf.predict(x_train)
show_accuracy(y_hat, y_train, '訓練集')
print(clf.score(x_test, y_test))
y_hat = clf.predict(x_test)
show_accuracy(y_hat, y_test, '測試集')

# 輸出決策方程和預測值
print('decision_function:\n', clf.decision_function(x_train))
print('\npredict:\n', clf.predict(x_train))

x1_min, x1_max = x[:, 0].min(), x[:, 0].max()  # 第0列的范圍
x2_min, x2_max = x[:, 1].min(), x[:, 1].max()  # 第1列的范圍
x1, x2  = np.mgrid[x1_min:x1_max:200j, x2_min:x2_max:200j]  # 生成網(wǎng)格采樣點
grid_test = np.stack((x1.flat, x2.flat), axis=1)  # 測試點

#繪圖
cm_light = mpl.colors.ListedColormap(['#A0FFA0', '#FFA0A0'])
cm_dark = mpl.colors.ListedColormap(['g', 'r'])

print('grid_test = \n', grid_test)
grid_hat = clf.predict(grid_test)  # 預測分類值
grid_hat = grid_hat.reshape(x1.shape)  # 使之與輸入的形狀相同

alpha = 0.5
fig, ax = plt.subplots()
cm_light = mpl.colors.ListedColormap(['#A0FFA0', '#FFA0A0'])
plt.pcolormesh(x2, x1, grid_hat, cmap=cm_light)     # 預測值的顯示
plt.title(u'指數(shù)漲跌情況', fontsize=15)
plt.xlabel(u'前一日漲跌MA%s'%ma, fontsize=13)
plt.ylabel(u'前一日收盤價MA%s'%ma, fontsize=13)
plt.scatter(x[:, 1], x[:, 0], c=np.squeeze(y), edgecolors='k', s=15, cmap=cm_dark) # 樣本
#plt.plot(x[:, 0], x[:, 1], 'o', color='blue',alpha=alpha, markeredgecolor='k')
plt.ylim(x1_min, x1_max)
plt.xlim(x2_min, x2_max)
#顯示網(wǎng)格
plt.grid()

#計時器
time_end=time.time()
print('totally cost %s secs.'%round(time_end-time_start, 2))

plt.show()

參考文章：
https://www.cnblogs.com/luyaoblog/p/6775342.html
http://wenda.chinahadoop.cn/question/4787