線性回歸法
思想
- 解決回歸問題
- 算法可解釋性強
- 一般在坐標軸中:橫軸是特征(屬性)智嚷,縱坐標為預(yù)測的結(jié)果猖任,輸出標記(具體數(shù)值)
分類問題中,橫軸和縱軸都是樣本特征屬性(腫瘤大小鳍鸵,腫瘤發(fā)現(xiàn)時間)
尤爾小屋
問題產(chǎn)生
image.png
- 求解出擬合的直線
- 根據(jù)樣本點
孕讳,求解預(yù)測值
- 求解真實值和預(yù)測值的差距盡量小 ,通常用
差的平方和最小
表示捷沸,損失函數(shù)為:
- 上面的損失函數(shù)
loss function
實際上就是求解
最小二乘法求解
求解損失函數(shù)的過程:
分別對求導(dǎo)摊沉,在令導(dǎo)數(shù)為0,進行求解最終結(jié)果為:
image.png
-
先對b求導(dǎo)
image.png
image.png -
對a求導(dǎo):
image.png
image.png
的另一種表示形式:
image.png
向量化過程
向量化主要是針對的式子來進行改進痒给,將:分子看做
说墨,分母看做
image.png
image.png
import numpy as np
class SimpleLinearRegression1(object):
def __init__(self):
# ab不是用戶送進來的參數(shù),相當于是私有的屬性
self.a_ = None
self.b_ = None
def fit(self, x_train,y_train):
# fit函數(shù):根據(jù)訓(xùn)練數(shù)據(jù)集來得到模型
assert x_train.ndim == 1, \
"simple linear regression can only solve single feature training data"
assert len(x_train) == len(y_train), \
"the size of x_train must be equal to the size of y_train"
x_mean = np.mean(x_train)
y_mean = np.mean(y_train)
num = 0.0
d = 0.0
for x, y in zip(x_train, y_train):
num += (x - x_mean) * (y - y_mean)
d += (x - x_mean) ** 2
self.a_ = num / d
self.b_ = y_mean - self.a_ * x_mean
# 返回自身苍柏,sklearn對fit函數(shù)的規(guī)范
return self
def predict(self, x_predict):
# 傳進來的是待預(yù)測的x
assert x_predict.ndim == 1, \
"simple linear regression can only solve single feature training data"
assert self.a_ is not None and self.b_ is not None, \
"must fit before predict!"
return np.array([self._predict(x) for x in x_predict])
def _predict(self, x_single):
# 對一個數(shù)據(jù)進行預(yù)測
return self.a_ * x_single + self.b_
def __repr__(self):
# 字符串輸出
return "SimpleLinearRegression1()"
# 通過向量化實現(xiàn)
class SimpleLinearRegression2(object):
def __init__(self):
# a, b不是用戶送進來的參數(shù)尼斧,相當于是私有的屬性
self.a_ = None
self.b_ = None
def fit(self, x_train, y_train):
# fit函數(shù):根據(jù)訓(xùn)練數(shù)據(jù)集來得到模型
assert x_train.ndim == 1, \
"simple linear regression can only solve single feature training data"
assert len(x_train) == len(y_train), \
"the size of x_train must be equal to the size of y_train"
x_mean = np.mean(x_train)
y_mean = np.mean(y_train)
# 改成向量形式代替for循環(huán),numpy中的.dot形式
# 參考上面的向量化公式
num = (x_train - x_mean).dot(y_train - y_mean)
d = (x_train - x_mean).dot(x_train - x_mean)
self.a_ = num / d
self.b_ = y_mean - self.a_ * x_mean
# 返回自身试吁,sklearn對fit函數(shù)的規(guī)范
return self
def predict(self, x_predict):
# 傳進來的是待預(yù)測的x
assert x_predict.ndim == 1, \
"simple linear regression can only solve single feature training data"
assert self.a_ is not None and self.b_ is not None, \
"must fit before predict!"
return np.array([self._predict(x) for x in x_predict])
def _predict(self, x_single):
# 對一個數(shù)據(jù)進行預(yù)測
return self.a_ * x_single + self.b_
def __repr__(self):
# 字符串函數(shù)棺棵,輸出方便進行查看
return "SimpleLinearRegression2()"
衡量標準
衡量標準:將數(shù)據(jù)分成訓(xùn)練數(shù)據(jù)集train
和測試數(shù)據(jù)集test
,通過訓(xùn)練數(shù)據(jù)集得到a和b熄捍,再通過測試數(shù)據(jù)集進行衡量
image.png
- 均方誤差MSE烛恤,mean squared error,存在量綱問題
- 均方根誤差RMSE余耽,root mean squared error缚柏,
- 平均絕對誤差MAE,mean absolute error碟贾,
sklearn中沒有RMSE币喧,只有MAE、MSE
import numpy as np
from math import sqrt
def accuracy_score(y_true, y_predict):
'''準確率的封裝:計算y_true和y_predict之間的準確率'''
assert y_true.shape[0] == y_predict.shape[0], \
"the size of y_true must be equal to the size of y_predict"
return sum(y_true ==y_predict) / len(y_true)
def mean_squared_error(y_true, y_predict):
# 計算y_true 和 y_predict之間的MSE
assert len(y_true) == len(y_predict), \
"the size of y_true must be equal to the size of y_predict"
return np.sum((y_true - y_predict)**2) / len(y_true)
def root_mean_squared_error(y_true, y_predict):
# 計算y_true 和 y_predict之間的RMSE
return sqrt(mean_squared_error(y_true, y_predict))
def mean_absolute_error(y_true, y_predict):
# 計算y_true 和 y_predict之間的MAE
assert len(y_true) == len(y_predict), \
"the size of y_true must be equal to the size of y_predict"
return np.sum(np.absolute(y_true - y_predict)) / len(y_true)
image.png
指標
指標的定義為
image.png
image.png
分子為模型預(yù)測產(chǎn)生的誤差袱耽;分母為使用均值產(chǎn)生的誤差(baseline model產(chǎn)生的誤差)
式子表示為:預(yù)測模型沒有產(chǎn)生誤差的指標
-
越小越好粱锐。
最大值為1,此時預(yù)測模型不犯誤差扛邑。模型等于基準模型時怜浅,
為0
- 當
小于0,此時學(xué)習(xí)到的模型還不如基準模型,說明數(shù)據(jù)可能不存在線性關(guān)系
-
R^2的另一種表示為
Var表示方差
image.png
多元線性回歸
將特征數(shù)從1拓展到了N恶座,求解思路和一元線性回歸類似搀暑。
image.png
目標函數(shù)
image.png
image.png
image.png
image.png
image.png
image.png