基本概念
- 訓(xùn)練集、驗(yàn)證集伸蚯、測(cè)試集(交叉驗(yàn)證撬码、自助法等)
- 目標(biāo)函數(shù)
- 損失函數(shù)
- 優(yōu)化方法(梯度下降法等)
- 擬合维蒙、過(guò)擬合
- 準(zhǔn)確率掰吕、泛化性能
模型訓(xùn)練
不可能有一個(gè)理想的線性函數(shù)經(jīng)過(guò)所有訓(xùn)練集的數(shù)據(jù)點(diǎn),把“偏移”看做誤差颅痊,假設(shè)誤差符合高斯分布殖熟。
求解方法有:
- 直接采用極值方法求解
- 梯度下降法求解
polynomial 方法可以對(duì)樣本是非線性的,對(duì)系數(shù)是線性的斑响。
image.png
正則化技術(shù)
- LASSO具有稀疏作用
- Ridge收斂更快
目標(biāo)函數(shù)仍然是不帶正則化的原函數(shù)菱属,經(jīng)過(guò)改造的上式稱為損失函數(shù),優(yōu)化目標(biāo)就是使損失函數(shù)最小
模型評(píng)價(jià)
- 連續(xù)數(shù)據(jù)(回歸問(wèn)題)舰罚,一般使用方差評(píng)估
- 離散數(shù)據(jù)(分類問(wèn)題)纽门,一般使用 ** accuracy、precision/recall**
線性回歸模型有很多营罢,區(qū)別在于如何從訓(xùn)練數(shù)據(jù)中學(xué)習(xí)參數(shù) w 和 b赏陵,以及如何控制模型復(fù)雜程度。
1饲漾、普通最小二乘法
線性回歸尋找參數(shù) w 和 b蝙搔,使得對(duì)訓(xùn)練集的預(yù)測(cè)值與真實(shí)的回歸目標(biāo)值 y 之間的均方誤差最小。(模型可能存在欠擬合和過(guò)擬合)
2考传、嶺回歸
嶺回歸中吃型,對(duì)系數(shù)(w)的選則不僅要在訓(xùn)練數(shù)據(jù)上得到好的預(yù)測(cè)結(jié)果,而且還要擬合附加約束僚楞。希望系數(shù)盡量的小勤晚。即 L2 正則化。
alpha 的設(shè)置
復(fù)雜度更小的模型意味著在訓(xùn)練集上的性能更差镜硕,但泛化性能更好运翼。Ridge 模型在模型的簡(jiǎn)單性(系數(shù)都接近 0)與訓(xùn)練集性能之間做出權(quán)衡。簡(jiǎn)單性和訓(xùn)練集性能二者對(duì)于模型的重要程度可以由用戶通過(guò)設(shè)置 alpha 參數(shù)來(lái)指定兴枯。增大 alpha 會(huì)使得系數(shù)更加趨近于 0 血淌,從而降低訓(xùn)練集性能,但可能會(huì)提高泛化性能财剖。
大 alpha 對(duì)應(yīng)的 coef_ 元素比小 alpha 對(duì)應(yīng)的 coef_ 元素要小悠夯。
訓(xùn)練數(shù)據(jù)量的影響
如果足夠多的訓(xùn)練數(shù)據(jù),正則化變得不那么重要躺坟。如果添加更多的數(shù)據(jù)沦补,模型將更加難以過(guò)擬合或記住所有的數(shù)據(jù)
3、lasso
L1 正則化的結(jié)果是:使用 lasso 某些系數(shù)剛好為 0咪橙,可以看作是一種自動(dòng)化的特征選擇夕膀。
alpha 可以控制系數(shù)趨向于 0 的強(qiáng)度虚倒,默認(rèn)為 1.0 。
實(shí)踐中产舞,兩個(gè)模型一般首選嶺回歸魂奥。但如果特征很多,你認(rèn)為只有其中幾個(gè)是重要的易猫,那么選擇 Lasso 可能更好耻煤。
scikit-learn 提供了 ElasticNet 類,結(jié)合了 Lasso 和 Ridge 的懲罰項(xiàng)准颓。在實(shí)踐過(guò)程中哈蝇,這種結(jié)合的效果最好,不過(guò)代價(jià)是要調(diào)節(jié)兩個(gè)參數(shù):一個(gè)用于 L1 正則化攘已,一個(gè)用于 L2 正則化炮赦。
4、用于分類的線性模型
- Logistic 回歸
- 線性支持向量機(jī)(線性 SVM)
兩個(gè)模型都默認(rèn)使用 L2 正則化贯被,決定正則化強(qiáng)度的權(quán)衡參數(shù)是 C眼五,C 值越大,對(duì)應(yīng)的正則化越弱彤灶。
較小的 C 值可以讓算法盡量適應(yīng)“大多數(shù)”數(shù)據(jù)點(diǎn),而較大的 C 值更強(qiáng)調(diào)每個(gè)數(shù)據(jù)點(diǎn)都分類正確的重要性批旺。
5幌陕、用于多分類的線性模型
將二分類算法推廣到多分類算法的一種常見(jiàn)方法“一對(duì)多余”方法。對(duì)每個(gè)類別都學(xué)習(xí)一個(gè)二分類模型汽煮,將這個(gè)類別與所有其他類別盡量分開(kāi)搏熄。
每個(gè)類別都對(duì)應(yīng)一個(gè)二類分類器。
圖像中間的三角形區(qū)域?qū)儆谀囊粋€(gè)類別呢暇赤?答案是分類方程結(jié)果最大的那個(gè)類別心例,即最接近的那條線對(duì)應(yīng)的類別。
6鞋囊、代碼
import mglearn
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.datasets import load_breast_cancer
from sklearn.datasets import make_blobs
# 最小二乘法
def ols_test(X_train, X_test, y_train, y_test):
mglearn.plots.plot_linear_regression_wave()
plt.show()
lr=LinearRegression()
lr.fit(X_train,y_train)
print('lr.coef_:{}'.format(lr.coef_))
print('lr.intercept_:{}'.format(lr.intercept_))
# 模型欠擬合
print('Training set score:{:.2f}'.format(lr.score(X_train,y_train)))
print('Test set score:{:.2f}'.format(lr.score(X_test,y_test)))
X,y=mglearn.datasets.load_extended_boston()
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0)
lr=LinearRegression()
lr.fit(X_train,y_train)
# 過(guò)擬合
print('Training set score:{:.2f}'.format(lr.score(X_train,y_train)))
print('Test set score:{:.2f}'.format(lr.score(X_test,y_test)))
def ridge_test():
X, y = mglearn.datasets.load_extended_boston()
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
lr = LinearRegression()
lr.fit(X_train, y_train)
# 過(guò)擬合
print('Training set score:{:.2f}'.format(lr.score(X_train, y_train)))
print('Test set score:{:.2f}'.format(lr.score(X_test, y_test)))
# 嶺回歸模型
ridge=Ridge()
ridge.fit(X_train,y_train)
print('Training set score:{:.2f}'.format(ridge.score(X_train,y_train)))
print('Test set score:{:.2f}'.format(ridge.score(X_test,y_test)))
ridge10=Ridge(alpha=10)
ridge10.fit(X_train,y_train)
print('Training set score:{:.2f}'.format(ridge10.score(X_train,y_train)))
print('Test set score:{:.2f}'.format(ridge10.score(X_test,y_test)))
ridge01=Ridge(alpha=0.1)
ridge01.fit(X_train,y_train)
print('Training set score:{:.2f}'.format(ridge01.score(X_train,y_train)))
print('Test set score:{:.2f}'.format(ridge01.score(X_test,y_test)))
plt.plot(ridge.coef_,'s',label='Ridge alpha=1')
plt.plot(ridge10.coef_,'^',label='Ridge alpha=10')
plt.plot(ridge01.coef_,'v',label='Ridege alpha=0.1')
plt.plot(lr.coef_,'o',label='LinearRegression')
plt.xlabel('Coefficient index')
plt.hlines(0,0,len(lr.coef_))
plt.ylim(-25,25)
plt.legend()
plt.show()
# 訓(xùn)練數(shù)據(jù)量對(duì)正則化的影響
mglearn.plots.plot_ridge_n_samples()
plt.show()
# lasso
def lasso_test():
X, y = mglearn.datasets.load_extended_boston()
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# 欠擬合
lasso=Lasso()
lasso.fit(X_train,y_train)
print('Training set score:{:.2f}'.format(lasso.score(X_train,y_train)))
print('Test set score:{:.2f}'.format(lasso.score(X_test,y_test)))
print('Number of features used :{}'.format(np.sum(lasso.coef_!=0)))
lasso001=Lasso(alpha=0.01,max_iter=100000)
lasso001.fit(X_train,y_train)
print('Training set score:{:.2f}'.format(lasso001.score(X_train,y_train)))
print('Test set score:{:.2f}'.format(lasso001.score(X_test,y_test)))
print('Number of features used :{}'.format(np.sum(lasso001.coef_!=0)))
# 過(guò)擬合
lasso00001=Lasso(alpha=0.0001,max_iter=100000)
lasso00001.fit(X_train,y_train)
print('Training set score:{:.2f}'.format(lasso00001.score(X_train,y_train)))
print('Test set score:{:.2f}'.format(lasso00001.score(X_test,y_test)))
print('Number of features used :{}'.format(np.sum(lasso00001.coef_!=0)))
plt.plot(lasso.coef_,'s',label='Lasso alpha=1')
plt.plot(lasso001.coef_,'^',label='Lasso alpha=0.01')
plt.plot(lasso00001.coef_,'v',label='Lasso alpha=0.0001')
ridge01 = Ridge(alpha=0.1)
ridge01.fit(X_train, y_train)
plt.plot(ridge01.coef_,'o',label='Ridge alpha=0.1')
plt.legend(ncol=2,loc=(0,1.05))
plt.ylim(-25,25)
plt.xlabel('Coefficient index')
plt.ylabel('Coefficient magnitude')
plt.show()
# 分類
def class_test():
X,y=mglearn.datasets.make_forge()
fig,axes=plt.subplots(1,2,figsize=(10,3))
for model,ax in zip([LinearSVC(),LogisticRegression()],axes):
clf=model.fit(X,y)
mglearn.plots.plot_2d_separator(clf,X,fill=False,eps=0.5,ax=ax,alpha=0.7)
mglearn.discrete_scatter(X[:,0],X[:,1],y,ax=ax)
ax.set_title('{}'.format(clf.__class__.__name__))
ax.set_xlabel('Feature 0')
ax.set_ylabel('Feature 1')
axes[0].legend()
mglearn.plots.plot_linear_svc_regularization()
plt.show()
def logistic_test():
cancer=load_breast_cancer()
X_train,X_test,y_train,y_test=train_test_split(cancer.data,cancer.target,stratify=cancer.target,random_state=42)
logreg=LogisticRegression()
logreg.fit(X_train,y_train)
print('Traning set score:{:.3f}'.format(logreg.score(X_train,y_train)))
print('Test set score:{:.3f}'.format(logreg.score(X_test,y_test)))
logreg100=LogisticRegression(C=100)
logreg100.fit(X_train,y_train)
print('Traning set score:{:.3f}'.format(logreg100.score(X_train,y_train)))
print('Test set score:{:.3f}'.format(logreg100.score(X_test,y_test)))
logreg001=LogisticRegression(C=0.01)
logreg001.fit(X_train,y_train)
print('Traning set score:{:.3f}'.format(logreg001.score(X_train,y_train)))
print('Test set score:{:.3f}'.format(logreg001.score(X_test,y_test)))
plt.plot(logreg.coef_.T,'s',label='C=1')
plt.plot(logreg100.coef_.T,'^',label='C=100')
plt.plot(logreg001.coef_.T,'v',label='C=0.001')
plt.xticks(range(cancer.data.shape[1]),cancer.feature_names,rotation=90)
plt.hlines(0,0,cancer.data.shape[1])
plt.ylim(-5,5)
plt.xlabel('Coefficient index')
plt.ylabel('Coefficient magnitude')
plt.legend()
plt.show()
for C,marker in zip([0.001,1,100],['o','^','v']):
lr_l1=LogisticRegression(penalty='l1',C=C).fit(X_train,y_train)
print('Training accuracy of l1 logreg with C={:.3f}:{:.2f}'.format(C,lr_l1.score(X_train,y_train)))
print('Test accuracy of l1 logreg with C={:.3f}:{:.2f}'.format(C, lr_l1.score(X_test, y_test)))
plt.plot(lr_l1.coef_.T,marker,label='C={:.3f}'.format(C))
plt.xticks(range(cancer.data.shape[1]),cancer.feature_names,rotation=90)
plt.hlines(0,0,cancer.data.shape[1])
plt.ylim(-5,5)
plt.xlabel('Coefficient index')
plt.ylabel('Coefficient magnitude')
plt.legend(loc=3)
plt.show()
def multi_classification():
X,y=make_blobs(random_state=42)
mglearn.discrete_scatter(X[:,0],X[:,1],y)
plt.xlabel('Feature 0')
plt.ylabel('Feature 1')
plt.legend(['class 0','class 1','class 2',])
plt.show()
plt.clf()
linear_svm=LinearSVC().fit(X,y)
print('Coefficient shape:',linear_svm.coef_.shape)
print('Intercept shape:',linear_svm.intercept_.shape)
mglearn.discrete_scatter(X[:, 0], X[:, 1], y)
line=np.linspace(-15,15)
for coef,intercept,color in zip(linear_svm.coef_,linear_svm.intercept_,['b','r','g']):
plt.plot(line,-(line*coef[0]+intercept)/coef[1],c=color)
plt.ylim(-10,15)
plt.xlim(-10,8)
plt.xlabel('Feature 0')
plt.ylabel('Feature 1')
plt.legend(['Class 0','Class 1','Class 2','Line class 0','Line class 1','Line class 2'],loc=(1.01,0.3))
plt.show()
if __name__=='__main__':
# X, y = mglearn.datasets.make_wave(n_samples=60)
# X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# ols_test(X_train, X_test, y_train, y_test)
# ridge_test()
# lasso_test()
# class_test()
# logistic_test()
multi_classification()
```p