Day11：K-NN案例

data.PNG

這個數(shù)據(jù)集是某社交網(wǎng)絡(luò)的用戶信息梁厉，有Uesr ID、Gender、Age迷捧、EstimatedSalary。某汽車公司生產(chǎn)了新型豪華SUV胀葱，我們試圖找出社交網(wǎng)絡(luò)中的哪些用戶會買這款新車漠秋。數(shù)據(jù)最后一列Purchased表示用戶是否購買了這款車。我們希望通過Age和EstimatedSalary兩個變量抵屿，建立一個模型庆锦，來預(yù)測用戶是否會購買這款車。所以我們的特征矩陣只包含這兩列轧葛，來研究Age搂抒、EstimatedSalary和是否購買之間的關(guān)系。

一尿扯、數(shù)據(jù)預(yù)處理

導(dǎo)入庫

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

導(dǎo)入數(shù)據(jù)

df = pd.read_csv('D:\\data\\Social_Network_Ads.csv')
X = df.iloc[:,2:4]
Y = df.iloc[:,-1]

分割數(shù)據(jù)集

from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25)

數(shù)據(jù)標準化

from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
ss = ss.fit(X_train)
X_train = ss.transform(X_train)
X_test = ss.transform(X_test)

二求晶、訓(xùn)練K-NN模型

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2 )
knn.fit(X_train, Y_train)

三、預(yù)測測試集結(jié)果

Y_pred = knn.predict(X_test)

四衷笋、效果評估

混淆矩陣

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test, Y_pred)

array([[60, 6],
[ 6, 28]], dtype=int64)

可視化
訓(xùn)練集可視化

from matplotlib.colors import ListedColormap
X_set, y_set = X_train, Y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, knn.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('KNN (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()

測試集可視化

from matplotlib.colors import ListedColormap
X_set, y_set = X_test, Y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, knn.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()

最后編輯于：2018.08.08 11:02:49

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者