參數(shù)調(diào)優(yōu)GridSearchCV:我們可以根據(jù)交叉驗(yàn)證評(píng)估的結(jié)果唉窃,選擇最佳參數(shù)的模型
–輸入待調(diào)節(jié)參數(shù)的范圍(grid)耙饰,對(duì)一組參數(shù)對(duì)應(yīng)的模型進(jìn)行評(píng)估,并給出最佳模型及其參數(shù)
模型評(píng)估小結(jié)
?通常k-折交叉驗(yàn)證是評(píng)估機(jī)器學(xué)習(xí)模型的黃金準(zhǔn)則(k=3, 5, 10)
?當(dāng)類別數(shù)目較多句携,或者每類樣本數(shù)目不均衡時(shí)榔幸,采用stratified交
叉驗(yàn)證
?當(dāng)訓(xùn)練數(shù)據(jù)集很大允乐,train/test split帶來(lái)的模型性能估計(jì)偏差很
小矮嫉,或者模型訓(xùn)練很慢時(shí),采用train/test split
?對(duì)給定問題找到一種技術(shù)牍疏,速度快且能得到合理的性能估計(jì)
?如果有疑問蠢笋,對(duì)回歸問題,采用10-fold cross-validation ;對(duì)分類鳞陨,
采用stratified 10-fold cross-validation
# 運(yùn)行 xgboost安裝包中的示例程序
from xgboost import XGBClassifier
# 加載LibSVM格式數(shù)據(jù)模塊
from sklearn.datasets import load_svmlight_file
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
from matplotlib import pyplot
# read in data昨寞,數(shù)據(jù)在xgboost安裝的路徑下的demo目錄,現(xiàn)在copy到代碼目錄下的data目錄
my_workpath = 'C:/Users/zdx/xgboost/demo/data/'
X_train,y_train = load_svmlight_file(my_workpath + 'agaricus.txt.train')
X_test,y_test = load_svmlight_file(my_workpath + 'agaricus.txt.test')
# specify parameters via map
params = {'max_depth':2, 'eta':0.1, 'silent':0, 'objective':'binary:logistic' }
#bst = XGBClassifier(param)
bst =XGBClassifier(max_depth=2, learning_rate=0.1, silent=True, objective='binary:logistic')
# 設(shè)置boosting迭代計(jì)算次數(shù)
param_test = { #弱分類器的數(shù)目以及范圍
'n_estimators':list(range(1, 51, 1))
}
clf = GridSearchCV(estimator = bst, param_grid = param_test, scoring='accuracy', cv=5)
clf.fit(X_train, y_train)
clf.grid_scores_, clf.best_params_, clf.best_score_
#make prediction
preds = clf.predict(X_test)
predictions = [round(value) for value in preds]
test_accuracy = accuracy_score(y_test, predictions)
print("Test Accuracy of gridsearchcv: %.2f%%" % (test_accuracy * 100.0))