線性回歸是機(jī)器學(xué)習(xí)中最簡單的算法祟偷,這篇文章我們將探索這一算法寄猩,并用pyton來實(shí)現(xiàn)它甥材。
??分類:簡單線性回歸&多元線性回歸
簡單線性回歸
模型表示:
- one input variable - X(自變量) and one output variable - Y(因變量)
- 我們希望在這些變量之間建立一個(gè)線性關(guān)系岭洲。我們定義的線性關(guān)系如下: Y=β0+β1*X
該β 1被稱為比例系數(shù),β0被稱為偏差(權(quán)重)系數(shù)遵蚜。偏差系數(shù)為這個(gè)模型提供了額外的自由度苦始。 - 我們的目的是確定這些系數(shù)的值并通過python提供的matplot庫顯示出擬合圖像寞钥。
模型的代碼實(shí)現(xiàn):
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (20.0,10.0)
data = pd.read_csv('headbrain.csv')
#print(data.shape)
#print(data.head())
#Collecting X and Y
X = data['Head Size(cm^3)'].values
Y = data['Brain Weight(grams)'].values
#Mean X and Y
mean_x = np.mean(X)
mean_y = np.mean(Y)
m = len(X)
number = 0
denom = 0
for i in range(m):
number += (X[i]-mean_x)*(Y[i]-mean_y)
denom += (X[i]-mean_x)**2
b1 = number / denom
b0 = mean_y - (b1*mean_x)
print(b1,b0)
max_x = np.max(X) + 100
min_x = np.min(X) - 100
x = np.linspace(min_x,max_x,1000)
y = b0 + b1*x
plt.plot(x,y,color='#58b907',label='Regression line')
plt.scatter(X,Y,c='#ef5432', label='Scatter Plot')
plt.xlabel('Head Size in cm3')
plt.ylabel('Brain Weight in gram')
plt.legend()
plt.show()
代碼經(jīng)過測試可運(yùn)行
對(duì)模型評(píng)估
方法有:均方根誤差,測定系數(shù)法
均方根誤差:
其中^Yi表示預(yù)測的輸出值陌选。
- 代碼實(shí)現(xiàn):
# RMSE to evaluate models:
rmse = 0
for i in range(m):
y_pred = b0 + b1 * X[i]
rmse += (Y[i] - y_pred) ** 2
rmse = np.sqrt(rmse/m)
print(rmse)
系數(shù)測定(R^2 Score):
#Coefficient of Determination(R^2 Score):
ss_t = 0
ss_r = 0
for i in range(m):
y_pred = b0 + b1 * X[i]
ss_t += (Y[i] - mean_y) ** 2
ss_r += (Y[i] - y_pred) ** 2
r2 = 1 - (ss_r / ss_t)
print(r2)