2.1 基本思路
- 給定一張測試圖片藐握,計(jì)算它與訓(xùn)練集中所有圖片的距離(即相似度靴拱,距離越小相似度越高)
- 找出與測試圖片距離最小一張訓(xùn)練集圖片
- 將該訓(xùn)練集圖片的類別作為測試圖片的類別
2.2 圖片距離的定義(相似性度量)
采用L1距離:
其中, 是兩張圖片猾普,是圖片中的像素袜炕。
def l1(img1, img2):
""" 計(jì)算圖片間的L1距離
Args:
img1 (np.ndarray): 二維數(shù)組(圖片像素為28×28)
img2 (np.ndarray): 二維數(shù)組
Returns:
float: 圖片L1距離
"""
return np.sum(abs(img1 - img2))
2.3 實(shí)現(xiàn)最近鄰算法
2.3.1 定義模型
模型沒有參數(shù),只保存訓(xùn)練集的圖片和標(biāo)簽初家。
class NearestNeighbor(object):
def __init__(self):
""" 最近鄰模型 """
self.train_images = []
self.train_labels = []
2.3.3 訓(xùn)練模型
訓(xùn)練過程就是保存訓(xùn)練集數(shù)據(jù)的過程偎窘。
def train(train_images, train_labels):
""" 訓(xùn)練模型
Args:
train_images (list[np.ndarray]): 圖片集
train_labels (list[int]): 標(biāo)簽集
Returns:
NearestNeighbor: 訓(xùn)練好的模型
"""
model = NearestNeighbor()
model.train_images = train_images
model.train_labels = train_labels
return model
2.3.4 預(yù)測
預(yù)測時,將測試圖片與訓(xùn)練集圖片一一對比溜在,找出最相似(L1距離最心爸)的一張,返回其類別掖肋。
def predict(model, test_image):
""" 預(yù)測圖片類別
Args:
model (NearestNeighbor): 訓(xùn)練好的模型
test_image (np.ndarray): 測試圖片仆葡,28×28矩陣
Returns:
int: 預(yù)測的類別(0~9)
"""
all_l1 = [l1(test_image, image) for image in model.train_images]
similar_image_index = np.argmin((all_l1)
label = model.train_labels[similar_image_index]
return label
2.3.5 評估預(yù)測準(zhǔn)確率
- 定義自己的混淆矩陣類:
from sklearn.metrics import confusion_matrix
class ConfMat(object):
"""Confusion matrix
Args:
true_y (np.ndarray or list[int]): 真實(shí)標(biāo)簽值
pred_y (np.ndarray or list[int]): 預(yù)測標(biāo)簽值
"""
def __init__(self, y, pred):
assert len(true_y) == len(pred_y)
confmat = confusion_matrix(true_y, pred_y).transpose().astype(int)
self.n_classes = confmat.shape[0]
self.confmat = np.zeros((self.n_classes + 1, self.n_classes + 1))
self.confmat[:-1, :-1] = confmat
self.accuracy = self.confmat.diagonal().sum() / self.confmat.sum()
self.precisions = np.zeros(self.n_classes)
self.recalls = np.zeros(self.n_classes)
for i in range(self.n_classes):
row_sum = self.confmat[i, :].sum()
self.precisions[i] = self.confmat[i, i] / row_sum if row_sum > 0 else 0
col_sum = self.confmat[:, i].sum()
self.recalls[i] = self.confmat[i, i] / col_sum if col_sum > 0 else 0
self.confmat[-1, :-1] = self.recalls
self.confmat[:-1, -1] = self.precisions
self.confmat[-1, -1] = self.accuracy
- 在測試集上評估模型,并計(jì)算混淆矩陣:
def get_confmat(model, test_images, test_labels):
""" 用測試集評估模型志笼,得到混淆矩陣
Args:
model (NearestNeighbor): 訓(xùn)練好的模型
test_images (list[np.ndarray]): 測試集圖片
test_labels (list[int]): 測試集標(biāo)簽
Returns:
np.array: 混淆矩陣
"""
pred_labels = [predict(model, image) for image in test_images]
return ConfMat(test_labels, pred_labels)
- 評估結(jié)果:
準(zhǔn)確率
973, 0, 9, 0, 1, 2, 5, 0, 9, 1, 0.973
2, 1129, 8, 2, 9, 1, 2, 20, 5, 5, 0.954
1, 3, 987, 4, 0, 0, 1, 4, 6, 1, 0.980
0, 0, 6, 965, 0, 17, 0, 2, 21, 7, 0.948
0, 1, 1, 1, 937, 2, 2, 4, 4, 13, 0.971
1, 1, 0, 21, 0, 848, 5, 0, 18, 5, 0.943
2, 1, 2, 0, 3, 9, 943, 0, 3, 1, 0.978
1, 0, 17, 9, 4, 1, 0, 989, 4, 9, 0.956
0, 0, 2, 4, 1, 5, 0, 0, 894, 1, 0.986
0, 0, 0, 4, 27, 7, 0, 9, 10, 966, 0.944
召回率:0.993, 0.995, 0.956, 0.955, 0.954, 0.951, 0.984, 0.962, 0.918, 0.957, 0.963