sklearn作為機(jī)器學(xué)習(xí)中一個(gè)強(qiáng)大的算法包玩讳,內(nèi)置了許多經(jīng)典的回歸算法议双。
線性回歸
線性回歸擬合一個(gè)帶系數(shù)的線性模型絮姆,以最小化數(shù)據(jù)中的觀測(cè)值與線性預(yù)測(cè)值之間的殘差平方和羊壹。
#X_train侥祭,X_test二維叁执, y_train一維
#加載線性模型算法庫
from sklearn import linear_model
# 創(chuàng)建線性回歸模型的對(duì)象
regr = linear_model.LinearRegression()
# 利用訓(xùn)練集訓(xùn)練線性模型
regr.fit(X_train, y_train)
# 使用測(cè)試集做預(yù)測(cè)
y_pred = regr.predict(X_test)
KNN回歸
在數(shù)據(jù)標(biāo)簽是連續(xù)變量而不是離散變量的情況下,可以使用KNN回歸矮冬。分配給查詢點(diǎn)的標(biāo)簽是根據(jù)其最近鄰居標(biāo)簽的平均值計(jì)算的谈宛。
from sklearn.neighbors import KNeighborsRegressor
neigh = KNeighborsRegressor(n_neighbors=2)
neigh.fit(X_train, y_train)
y_pred=neigh.predict(X_test)
決策樹回歸
決策樹也可以應(yīng)用于回歸問題
from sklearn.tree import DecisionTreeRegressor
clf = DecisionTreeRegressor()
clf = clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
隨機(jī)森林回歸
from sklearn.ensemble import RandomForestRegressor
regr = RandomForestRegressor(max_depth=2, random_state=0,
n_estimators=100)
regr.fit(X_train, y_train)
pred = regr.predict(X_test)
XGBoost回歸
基本所有的機(jī)器學(xué)習(xí)比賽的冠軍方案都使用了XGBoost算法
import xgboost as xgb
xgb_model = xgb.XGBRegressor(max_depth = 3,
learning_rate = 0.1,
n_estimators = 100,
objective = 'reg:linear',
n_jobs = -1)
xgb_model.fit(X_train, y_train,
eval_set=[(X_train, y_train)],
eval_metric='logloss',
verbose=100)
y_pred = xgb_model.predict(X_test)
支持向量回歸
from sklearn.svm import SVR
#創(chuàng)建SVR回歸模型的對(duì)象
clf = SVR()
# 利用訓(xùn)練集訓(xùn)練SVR回歸模型
clf.fit(X_train, y_train)
"""
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
tol=0.001, verbose=False)
"""
clf.predict(X_test)
神經(jīng)網(wǎng)絡(luò)
from sklearn.neural_network import MLPRegressor
mlp=MLPRegressor()
mlp.fit(X_train,y_train)
"""
MLPRegressor(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
beta_2=0.999, early_stopping=False, epsilon=1e-08,
hidden_layer_sizes=(100,), learning_rate='constant',
learning_rate_init=0.001, max_iter=200, momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=None, shuffle=True, solver='adam', tol=0.0001,
validation_fraction=0.1, verbose=False, warm_start=False)
"""
y_pred = mlp.predict(X_test)
LightGBM回歸
LightGBM作為另一個(gè)使用基于樹的學(xué)習(xí)算法的梯度增強(qiáng)框架。相比于XGBoost胎署,LightGBM有如下優(yōu)點(diǎn)吆录,訓(xùn)練速度更快,效率更高效琼牧;低內(nèi)存的使用量恢筝。
import lightgbm as lgb
gbm = lgb.LGBMRegressor(num_leaves=31,
learning_rate=0.05,
n_estimators=20)
gbm.fit(X_train, y_train,
eval_set=[(X_train, y_train)],
eval_metric='logloss',
verbose=100)
y_pred = gbm.predict(X_test)
嶺回歸
嶺回歸通過對(duì)系數(shù)進(jìn)行懲罰(L2范式)來解決普通最小二乘法的一些問題,例如巨坊,當(dāng)特征之間完全共線性(有解)或者說特征之間高度相關(guān)撬槽,這個(gè)時(shí)候適合用嶺回歸。
from sklearn.linear_model import Ridge
# 創(chuàng)建嶺回歸模型的對(duì)象
reg = Ridge(alpha=.5)
# 利用訓(xùn)練集訓(xùn)練嶺回歸模型
reg.fit(X_train, y_train)
pred= reg.predict(X_test)
Lasso回歸
Lasso是一個(gè)估計(jì)稀疏稀疏的線性模型趾撵。它在某些情況下很有用侄柔,由于它傾向于選擇參數(shù)值較少的解,有效地減少了給定解所依賴的變量的數(shù)量占调。Lasso模型在最小二乘法的基礎(chǔ)上加入L1范式作為懲罰項(xiàng)暂题。
from sklearn.linear_model import Lasso
# 創(chuàng)建Lasso回歸模型的對(duì)象
reg = Lasso(alpha=0.1)
# 利用訓(xùn)練集訓(xùn)練Lasso回歸模型
reg.fit(X_train, y_train)
"""
Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False)
"""
# 使用測(cè)試集做預(yù)測(cè)
pred=reg.predict(X_test)
Elastic Net回歸
Elastic Net 是一個(gè)線性模型利用L1范式和L2范式共同作為懲罰項(xiàng)。這種組合既可以學(xué)習(xí)稀疏模型究珊,同時(shí)可以保持嶺回歸的正則化屬性.
from sklearn.linear_model import ElasticNet
#創(chuàng)建ElasticNet回歸模型的對(duì)象
regr = ElasticNet(random_state=0)
# 利用訓(xùn)練集訓(xùn)練ElasticNet回歸模型
regr.fit(X_train, y_train)
pred=regr.predict(X_test)
SGD回歸
SGD回歸也是一種線性回歸, 它通過隨機(jī)梯度下降最小化正則化經(jīng)驗(yàn)損失.
import numpy as np
from sklearn import linear_model
n_samples, n_features = 10, 5
np.random.seed(0)
clf = linear_model.SGDRegressor(max_iter=1000, tol=1e-3)
clf.fit(X_train, y_train)
pred=clf.predict(X_test)
"""
SGDRegressor(alpha=0.0001, average=False, early_stopping=False,
epsilon=0.1, eta0=0.01, fit_intercept=True, l1_ratio=0.15,
learning_rate='invscaling', loss='squared_loss', max_iter=1000,
n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25,
random_state=None, shuffle=True, tol=0.001, validation_fraction=0.1,
verbose=0, warm_start=False)
"""
回歸競(jìng)賽問題以及解決方案
入門級(jí)比賽:
Kaggle——房?jī)r(jià)預(yù)測(cè)
這個(gè)比賽作為最基礎(chǔ)的回歸問題之一薪者,很適合入門機(jī)器學(xué)習(xí)的小伙伴們。
網(wǎng)址:https://www.kaggle.com/c/house-prices-advanced-regression-techniques
經(jīng)典解決方案:
XGBoost解決方案: https://www.kaggle.com/dansbecker/xgboost
Lasso解決方案: https://www.kaggle.com/mymkyt/simple-lasso-public-score-0-12102
進(jìn)階比賽:
Kaggle——銷售量預(yù)測(cè)
這個(gè)比賽作為經(jīng)典的時(shí)間序列問題之一苦银,目標(biāo)是為了預(yù)測(cè)下個(gè)月每種產(chǎn)品和商店的總銷售額啸胧。
網(wǎng)址:https://www.kaggle.com/c/competitive-data-science-predict-future-sales
經(jīng)典解決方案:
LightGBM: https://www.kaggle.com/sanket30/predicting-sales-using-lightgbm
XGBoost: https://www.kaggle.com/fabianaboldrin/eda-xgboost
第一名解決方案:https://www.kaggle.com/c/competitive-data-science-predict-future-sales/discussion/74835#latest-503740
TOP比賽方案:
Kaggle——餐廳訪客預(yù)測(cè)
網(wǎng)址:https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting
解決方案:
1st 方案: https://www.kaggle.com/plantsgo/solution-public-0-471-private-0-505
7th 方案:https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting/discussion/49259#latest-284437
8th 方案:https://github.com/MaxHalford/kaggle-recruit-restaurant
12th 方案:https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting/discussion/49251#latest-282765
Kaggle——CorporaciónFavoritaGrocery銷售預(yù)測(cè)
網(wǎng)址:https://www.kaggle.com/c/favorita-grocery-sales-forecasting
解決方案:
1st 方案: https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47582#latest-360306
2st 方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47568#latest-278474
3st 方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47560#latest-302253
4st 方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47529#latest-271077
5st方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47556#latest-270515
6st方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47575#latest-269568