一械筛、簡(jiǎn)介
貝葉斯優(yōu)化用于機(jī)器學(xué)習(xí)調(diào)參稠诲,主要思想是侦鹏,給定優(yōu)化的目標(biāo)函數(shù)(廣義的函數(shù),只需指定輸入和輸出即可臀叙,無需知道內(nèi)部結(jié)構(gòu)以及數(shù)學(xué)性質(zhì))略水,通過不斷地添加樣本點(diǎn)來更新目標(biāo)函數(shù)的后驗(yàn)分布(高斯過程,直到后驗(yàn)分布基本貼合于真實(shí)分布。簡(jiǎn)單的說劝萤,就是考慮了上一次參數(shù)的信息渊涝,從而更好的調(diào)整當(dāng)前的參數(shù)。
與常規(guī)的網(wǎng)格搜索或者隨機(jī)搜索的區(qū)別是:
- 貝葉斯調(diào)參采用高斯過程稳其,考慮之前的參數(shù)信息驶赏,不斷地更新先驗(yàn);網(wǎng)格搜索未考慮之前的參數(shù)信息
- 貝葉斯調(diào)參迭代次數(shù)少既鞠,速度快煤傍;網(wǎng)格搜索速度慢,參數(shù)多時(shí)易導(dǎo)致維度爆炸
- 貝葉斯調(diào)參針對(duì)非凸問題依然穩(wěn)健嘱蛋;網(wǎng)格搜索針對(duì)非凸問題易得到局部?jī)?yōu)最
二蚯姆、理論
介紹貝葉斯優(yōu)化調(diào)參,必須要從兩個(gè)部分講起:
- 高斯過程洒敏,用以擬合優(yōu)化目標(biāo)函數(shù)
- 貝葉斯優(yōu)化龄恋,包括了“開采”和“勘探”,用以花最少的代價(jià)找到最優(yōu)值
高斯過程回歸講解學(xué)習(xí)參考
三凶伙、hyperopt
安裝pip list hyperopt
-
hyperopt貝葉斯優(yōu)化包涵四個(gè)部分:
目標(biāo)函數(shù):我們想要最小化的內(nèi)容郭毕,在這里,目標(biāo)函數(shù)是機(jī)器學(xué)習(xí)模型使用該組超參數(shù)在驗(yàn)證集上的損失函荣。
域空間:要搜索的超參數(shù)的取值范圍
優(yōu)化算法:構(gòu)造替代函數(shù)并選擇下一個(gè)超參數(shù)值進(jìn)行評(píng)估的方法显押。
結(jié)果歷史記錄:來自目標(biāo)函數(shù)評(píng)估的存儲(chǔ)結(jié)果扳肛,包括超參數(shù)和驗(yàn)證集上的損失。
示例
本程序采用感知機(jī)乘碑,對(duì)iris數(shù)據(jù)進(jìn)行分類挖息。 分別采用感知機(jī)原始方法和經(jīng)過貝葉斯優(yōu)化方法。
#!/usr/bin/env python
# encoding: utf-8
'''
@author: Great
@file: hyper_opt.py
@desc: hyperopt
'''
from sklearn import datasets
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
from sklearn.preprocessing import StandardScaler
iris = datasets.load_iris()
x = iris.data
y = iris.target
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=0)
std = StandardScaler()
std.fit(x_train)
std_x_train = std.transform(x_train)
std_x_test = std.transform(x_test)
ppn = Perceptron(n_iter=40,eta0=0.1,random_state=0)
ppn.fit(std_x_train,y_train)
y_pred = ppn.predict(std_x_test)
print(accuracy_score(y_test,y_pred))#0.82222
#hyperopt
#定義目標(biāo)函數(shù)(最小化目標(biāo)函數(shù)兽肤,添加負(fù)號(hào)的最大化accuracy_score,
def percept(args):
ppn = Perceptron(n_iter=args["n_iter"],
eta0= args["eta0"],
random_state=0)
ppn.fit(std_x_train,y_train)
y_pred = ppn.predict(std_x_test)
return -accuracy_score(y_test,y_pred)
#定義域空間
from hyperopt import hp
"""
choice:類別變量
quniform:離散均勻(整數(shù)間隔均勻)
uniform:連續(xù)均勻(間隔為一個(gè)浮點(diǎn)數(shù))
loguniform:連續(xù)對(duì)數(shù)均勻(對(duì)數(shù)下均勻分布)
"""
space = {
"n_iter":hp.choice("n_iter",range(30,50)),
"eta0":hp.uniform("eta0",0.05,0.5)
}
#優(yōu)化算法
from hyperopt import tpe, partial
"""
tpe優(yōu)化算法
partial指定搜索算法tpe的參數(shù)
"""
bayesopt = partial(tpe.suggest, n_startup_jobs=10)
#結(jié)果歷史
#from hyperopt import Trials
#bayes_trial = Trials()
#最小化目標(biāo)函數(shù)
from hyperopt import fmin
best = fmin(percept,space,bayesopt,max_evals=100)#,trials=bayes_trial)
print(best)
print(percept(best))
#{'eta0': 0.23191782419000273, 'n_iter': 18}
#-0.9777777777777777
四套腹、bayes_opt
- 安裝 pip3 install bayesian-optimization
- bayes_opt流程
- 定義優(yōu)化函數(shù)
- 定義優(yōu)化參數(shù)
- 開始優(yōu)化
- 顯示結(jié)果
*示例
本程序采用隨機(jī)森林對(duì)制作的二分類數(shù)據(jù)進(jìn)行分類。 分別采用原始隨機(jī)森林方法和經(jīng)過貝葉斯優(yōu)化的方法资铡。
#!/usr/bin/env python
# encoding: utf-8
'''
@author: Great
@desc: bayes_opt
'''
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from bayes_opt import BayesianOptimization
import numpy as np
#data
x, y = make_classification(n_samples=1000, n_features=10, n_classes=2)
rf = RandomForestClassifier()
#不做優(yōu)化的結(jié)果
print(np.mean(cross_val_score(rf,x,y,scoring="accuracy",cv=20)))
#定義優(yōu)化參數(shù)
def rf_cv(n_estimators, min_samples_split, max_depth, max_features):
val = cross_val_score(RandomForestClassifier(n_estimators=int(n_estimators),
min_samples_split=int(min_samples_split),
max_depth = int(max_depth),
max_features = min(max_features,0.999),
random_state = 2),
x,y,scoring="accuracy",cv=5).mean()
return val
#貝葉斯優(yōu)化
rf_bo = BayesianOptimization(rf_cv,
{
"n_estimators":(10,250),
"min_samples_split":(2,25),
"max_features":(0.1,0.999),
"max_depth":(5,15)
})
#開始優(yōu)化
num_iter = 25
init_points = 5
rf_bo.maximize(init_points=init_points,n_iter=num_iter)
#顯示優(yōu)化結(jié)果
rf_bo.res["max"]
#附近搜索(已經(jīng)有不錯(cuò)的參數(shù)值的時(shí)候)
rf_bo.explore(
{'n_estimators': [10, 100, 200],
'min_samples_split': [2, 10, 20],
'max_features': [0.1, 0.5, 0.9],
'max_depth': [5, 10, 15]
})
#驗(yàn)證優(yōu)化后參數(shù)的結(jié)果
rf = RandomForestClassifier(max_depth=5, max_features=0.432, min_samples_split=2, n_estimators=190)
np.mean(cross_val_score(rf, x, y, cv=20, scoring='roc_auc'))
五电禀、參考
bayes_opt: https://www.cnblogs.com/yangruiGB2312/p/9374377.html
hyperopt: https://blog.csdn.net/linxid/article/details/81189154
高斯過程: http://www.360doc.com/content/17/0810/05/43535834_678049865.shtml