當(dāng)進(jìn)行線性回歸擬合,有非常多特征變量(features)時(shí)腮介,不僅會(huì)極大增加模型復(fù)雜度续搀,造成對(duì)于訓(xùn)練集的過(guò)擬合,從而降低泛化能力坎炼;此外也增加了變量間共線性的可能(multicollinearity)愧膀,使模型的系數(shù)難以解釋。
regularization
正則化是一種防止過(guò)擬合的方法谣光,經(jīng)常與線性回歸配合使用檩淋,嶺回歸
與lasso回歸
便是其中兩種常見(jiàn)的形式。
1萄金、回歸正則化的簡(jiǎn)單理解
- 當(dāng)有非常多的特征變量時(shí)蟀悦,回歸模型會(huì)變得很復(fù)雜,具體變現(xiàn)在很多特征變量都有顯著意義的系數(shù)氧敢。不僅造成模型的過(guò)擬合日戈,而且可解釋性也大打折扣。
- 正則回歸化的假設(shè)前提是:只有其中部分特征變量是對(duì)建模有突出貢獻(xiàn)的孙乖。所以正則化回歸就是盡可能凸顯出部分有價(jià)值變量的地位涎拉,忽略其余的干擾變量。
- 常見(jiàn)的正則化方法有:(1)
Ridge
的圆,(2)Lasso (or LASSO)
,(3)Elastic net (or ENET)
1.1 嶺回歸
- 參數(shù)調(diào)整:
λ
值越大半火,對(duì)系數(shù)的抑制越高(越接近0) - 特點(diǎn):會(huì)保留所有的特征變量越妈,即系數(shù)只會(huì)趨近于0,而不會(huì)變成0(0則意味著丟棄該變量)
1.2 lasso回歸
- 參數(shù)調(diào)整:同樣
λ
值越大钮糖,對(duì)系數(shù)的抑制越高(直至為0) - 特點(diǎn):隨著
λ
的增大梅掠,會(huì)刪除“干擾”變量,保量有顯著意義的變量店归,達(dá)到特征選擇的目的阎抒。
1.3 Elastic nets
- 本質(zhì)為嶺回歸與lasso回歸的結(jié)合。
- 參數(shù)調(diào)整:
α
設(shè)置嶺回歸與lasso回歸的混合比例消痛,λ
值同樣設(shè)置參數(shù)抑制程度且叁。
2、R代碼實(shí)操
R包與相關(guān)函數(shù)
- 如下所示:主要使用
glmnet::glmnet()
秩伞,其中alpha
參數(shù)為1時(shí)(default)逞带,為lasso回歸欺矫;為0時(shí),為嶺回歸展氓;為0~1中間值時(shí)穆趴,為Elastic nets - 而關(guān)于
λ
參數(shù)調(diào)整,會(huì)自動(dòng)遍歷100個(gè)值遇汞,從中選出最合適的未妹。
library(glmnet)
glmnet(
x = X,
y = Y,
alpha = 1
)
示例數(shù)據(jù)
ames <- AmesHousing::make_ames()
dim(ames)
set.seed(123)
library(rsample)
split <- initial_split(ames, prop = 0.7,
strata = "Sale_Price")
ames_train <- training(split)
# [1] 2049 81
ames_test <- testing(split)
# [1] 881 81
# Create training feature matrices
# we use model.matrix(...)[, -1] to discard the intercept
X <- model.matrix(Sale_Price ~ ., ames_train)[, -1]
# transform y with log transformation
Y <- log(ames_train$Sale_Price)
parametric models such as regularized regression are sensitive to skewed response values so transforming can often improve predictive performance.
2.1 嶺回歸
Step1:初步建模,觀察不同λ
值對(duì)應(yīng)的參數(shù)值
ridge <- glmnet(x = X, y = Y,
alpha = 0)
str(ridge$lambda)
# num [1:100] 286 260 237 216 197 ...
#lambda值越小空入,對(duì)參數(shù)的抑制越低
coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 100]
# Latitude Overall_QualVery_Excellent
# 0.60703722 0.09344684
#lambda值越大络它,對(duì)參數(shù)的抑制越高
coef(ridge)[c("Latitude", "Overall_QualVery_Excellent"), 1]
# Latitude Overall_QualVery_Excellent
# 6.115930e-36 9.233251e-37
plot(ridge, xvar = "lambda")
Step2:10折交叉驗(yàn)證確認(rèn)最佳的λ
值
ridge <- cv.glmnet(x = X, y = Y,
alpha = 0)
plot(ridge, main = "Ridge penalty\n\n")
- 如上圖所示繪制出不同
log(λ)
對(duì)應(yīng)的MSE水平,其中繪制出兩條具有標(biāo)志意義的log(λ)
:左邊的為MSE最小值的log(λ)
执庐,右邊的為據(jù)左邊一個(gè)標(biāo)準(zhǔn)誤距離的log(λ)
酪耕。(the with an MSE within one standard error of the minimum MSE.)
# the value with the minimum MSE
ridge$lambda.min
# [1] 0.1525105
ridge$cvm[ridge$lambda == ridge$lambda.min]
min(ridge$cvm)
# [1] 0.0219778
# the largest value within one standard error of it
ridge$lambda.1se
# [1] 0.6156877
ridge$cvm[ridge$lambda == ridge$lambda.1se]
# [1] 0.0245219
Step3:最后結(jié)合交叉驗(yàn)證得出的最佳λ
值,可視化對(duì)應(yīng)的參數(shù)值
ridge <- cv.glmnet(x = X, y = Y,
alpha = 0)
ridge_min <- glmnet(x = X, y = Y,
alpha = 0)
plot(ridge_min, xvar = "lambda", main = "Ridge penalty\n\n")
abline(v = log(ridge$lambda.min), col = "red", lty = "dashed")
abline(v = log(ridge$lambda.1se), col = "blue", lty = "dashed")
2.2 lasso回歸
Step1:初步建模轨淌,觀察不同λ
值對(duì)應(yīng)的參數(shù)值
lasso <- glmnet(x = X, y = Y,
alpha = 1)
str(lasso$lambda)
# num [1:96] 0.286 0.26 0.237 0.216 0.197 ...
#lambda值越小迂烁,對(duì)參數(shù)的抑制越低
coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 96]
# Latitude Overall_QualVery_Excellent
# 0.8126079 0.2222406
#lambda值越大,對(duì)參數(shù)的抑制越高
coef(lasso)[c("Latitude", "Overall_QualVery_Excellent"), 1]
# Latitude Overall_QualVery_Excellent
# 0 0
plot(lasso, xvar = "lambda")
Step2:10折交叉驗(yàn)證確認(rèn)最佳的λ
值
lasso <- cv.glmnet(x = X, y = Y,
alpha = 1)
plot(lasso, main = "lasso penalty\n\n")
# the value with the minimum MSE
lasso$lambda.min
# [1] 0.003957686
lasso$cvm[lasso$lambda == lasso$lambda.min]
min(lasso$cvm)
# [1] 0.0229088
# the largest value within one standard error of it
lasso$lambda.1se
# [1] 0.0110125
lasso$cvm[lasso$lambda == lasso$lambda.1se]
# [1] 0.02566636
Step3:最后結(jié)合交叉驗(yàn)證得出的最佳λ
值递鹉,可視化對(duì)應(yīng)的參數(shù)值
lasso <- cv.glmnet(x = X, y = Y,
alpha = 1)
lasso_min <- glmnet(x = X, y = Y,
alpha = 1)
plot(lasso_min, xvar = "lambda", main = "lasso penalty\n\n")
abline(v = log(lasso$lambda.min), col = "red", lty = "dashed")
abline(v = log(lasso$lambda.1se), col = "blue", lty = "dashed")
Although this lasso model does not offer significant improvement over the ridge model, we get approximately the same accuracy by using only 64 features!
- 一個(gè)小細(xì)節(jié):因?yàn)橹皩?duì)數(shù)據(jù)處理時(shí)盟步,對(duì)Y響應(yīng)變量進(jìn)行了log轉(zhuǎn)換。所以如果需要和其它類型的模型比較RMSE值時(shí)躏结,需要進(jìn)行轉(zhuǎn)換却盘。
# predict sales price on training data
pred <- predict(lasso, X)
# compute RMSE of transformed predicted
RMSE(exp(pred), exp(Y))
## [1] 34161.13
Elastic Net是通過(guò)調(diào)整
α
參數(shù),使用嶺回歸與lasso回歸的混合媳拴,進(jìn)行擬合黄橘;可以通過(guò)caret
包尋找最合適的比例,就不演示了~