Day 2.jpg
簡單線性回歸
根據(jù)單個(gè)的自變量X預(yù)測因變量Y。通常假設(shè)X和Y之間是線性關(guān)系逃顶,試圖計(jì)算出一條直線Y=b0+b1X1來盡可能準(zhǔn)確地?cái)M合所有的點(diǎn)充甚,使得預(yù)測的Y值(Yp)與實(shí)際的Y值(Yi)最接近,即min(sum(Yi-Yp)^2)
案例:用學(xué)習(xí)時(shí)間預(yù)測分?jǐn)?shù)
一盈蛮、數(shù)據(jù)預(yù)處理
- 導(dǎo)入庫
- 導(dǎo)入數(shù)據(jù)
- 查找缺失值(本數(shù)據(jù)集無缺失值)
- 分割數(shù)據(jù)集
- 數(shù)據(jù)標(biāo)準(zhǔn)化(本數(shù)據(jù)集只有一個(gè)特征抖誉,無需標(biāo)準(zhǔn)化)
import numpy as np
import pandas as pd
df = pd.read_csv('D:\\data\\day2-studentscores.csv')
X = df.iloc[:,:1].values
Y = df.iloc[:,1].values
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=1/4, random_state = 0)
二衰倦、在訓(xùn)練集上訓(xùn)練簡單線性回歸模型
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor = regressor.fit(X_train, Y_train)
三、結(jié)果預(yù)測
將模型運(yùn)用到測試集上梳杏,得到預(yù)測結(jié)果
Y_pred = regressor.predict(X_test)
四淹接、可視化
通過散點(diǎn)圖查看實(shí)際值和預(yù)測值的偏差
import matplotlib.pyplot as plt
%matplotlib inline
1. 訓(xùn)練集可視化
plt.scatter(X_train, Y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
2. 測試集可視化
plt.scatter(X_test, Y_test, color = 'red')
plt.plot(X_test, Y_pred, color = 'blue')