KNN - 基于numpy實現(xiàn)程序

本文之編寫程序涉及到API介紹，程序的完整實現(xiàn)悍汛，具體算法原理請查看之前所寫的KNN算法介紹

一谜疤、基礎(chǔ)準備

1、python基礎(chǔ)

** sorted: **
python內(nèi)置的全局排序方法杭抠，它返回一個新的list，新的list的元素基于小于運算符(lt)來排序恳啥。

print(sorted([5, 2, 3, 1, 4]))
[1, 2, 3, 4, 5]

print(sorted("This is a test string from Andrew".split(), key=str.lower))
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']

Sorting basic:

>>> print sorted([5, 2, 3, 1, 4])
[1, 2, 3, 4, 5]

>>> L = [5, 2, 3, 1, 4]
>>> L.sort()
>>> print L
[1, 2, 3, 4, 5]

Sorting  cmp:
>>>L = [('b',2),('a',1),('c',3),('d',4)]
>>>print sorted(L, cmp=lambda x,y:cmp(x[1],y[1]))
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]

Sorting  keys:
>>>L = [('b',2),('a',1),('c',3),('d',4)]
>>>print sorted(L, key=lambda x:x[1]))
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
Sorting  reverse:

>>> print sorted([5, 2, 3, 1, 4], reverse=True)
[5, 4, 3, 2, 1]

>>> print sorted([5, 2, 3, 1, 4], reverse=False)
[1, 2, 3, 4, 5]

>>> L = [('d',2),('a',4),('b',3),('c',2)]
>>> print sorted(L, key=lambda x:(x[1],x[0]))
>>>[('c', 2), ('d', 2), ('b', 3), ('a', 4)]

** Counter.most_common**
返回一個TopN列表偏灿。如果n沒有被指定，則返回所有元素钝的。當(dāng)多個元素計數(shù)值相同時翁垂，排列是無確定順序的。

data = [1,2,2,3,3,2,1,1,2,]
print(Counter(data).most_common(1))
# >> [(2, 4)]
print(Counter(data).most_common(2))
# >> [(2, 4), (1, 3)]
print(Counter(data).most_common(1)[0][0])
# >> 2
print(Counter(data).most_common(1)[0][1])
# >> 4

2硝桩、numpy基礎(chǔ)

** np.shape**
shape函數(shù)是numpy.core.fromnumeric中的函數(shù)沿猜，它的功能是讀取矩陣的長度

data = [[1,2,2],[3,3,2]]
print(np.shape(data)[0])
#>> 2
print(np.shape(data)[1])
#>> 3

** np.zeros**
用法：zeros(shape, dtype=float, order='C')
返回：返回來一個給定形狀和類型的用0填充的數(shù)組；
參數(shù)：shape:形狀
dtype:數(shù)據(jù)類型碗脊，可選參數(shù)啼肩，默認numpy.float64
order:可選參數(shù)，c代表與C語言類似衙伶，行優(yōu)先祈坠；F代表列優(yōu)先

print(np.zeros(5))
# >> [ 0.  0.  0.  0.  0.]
print(np.zeros((5), dtype=bool))
# >> [False False False False False]
print(np.zeros([3,2]))
# >> [[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]

** np.tile**
tile函數(shù)位于python模塊 numpy.lib.shape_base中，他的功能是重復(fù)某個數(shù)組矢劲。比如tile(A,n)赦拘，功能是將數(shù)組A重復(fù)n次，

data=[[1,2],[3,4]]
print(np.tile(data,[2]))
# >>[[1 2 1 2]
 [3 4 3 4]]
print(np.tile(data,[2,2]))
# >>[[1 2 1 2]
 [3 4 3 4]
 [1 2 1 2]
 [3 4 3 4]]

** np.linalg.norm**

1500724082(1).png

>>> x = np.array([3, 4])
>>> np.linalg.norm(x)
5.
>>> np.linalg.norm(x, ord=2)
5.
>>> np.linalg.norm(x, ord=1)
7.
>>> np.linalg.norm(x, ord=np.inf)
4

二芬沉、完整程序

# -*- coding: utf-8 -*-
import math
import numpy as np
from collections import Counter
import warnings

## 經(jīng)典測試
# 流程
#   1躺同、對數(shù)據(jù)訓(xùn)練數(shù)據(jù)進行歸一化，（防止某些參數(shù)值過大影響距離計算）
#   2花嘶、按個取出測試數(shù)據(jù)（歸一化）笋籽，計算該個數(shù)據(jù)到所有訓(xùn)練數(shù)據(jù)的歐幾里得距離
#   3、對該個數(shù)據(jù)到各點的數(shù)據(jù)距離進行排序
#   4椭员、過濾出排名前幾名的點车海，列出各點的歸類
#   5、計算最大歸類就該分類了。


# k-Nearest Neighbor算法
def k_nearest_neighbors(data, predict, classLabel,  k=5):
    if len(data) >= k:
        warnings.warn("k is too small")
    distances = []
    dataSize = data.shape[0]
    # KNN的算法核心就是歐式距離的計算

    # 第一種方案：
    diffMat = np.tile(predict, (dataSize, 1)) - data
    sqDiffMat = diffMat ** 2
    euclidean_distance = sqDiffMat.sum(axis=1) ** 0.5
    for i in range(dataSize):
        # 將距離加上結(jié)果
        distances.append([euclidean_distance[i], classLabel[i]])

    # 第二種方案：
    # for i in range(dataSize):
    #     features = data[i]
    #     # 計算predict點到各點的距離
    #     euclidean_distance = np.linalg.norm(np.array(features)-np.array(predict))
    #     # 將距離加上結(jié)果
    #     distances.append([euclidean_distance, classLabel[i]])
    
    # 根據(jù)距離排序,并返回分類結(jié)果
    sorted_distances =[i[1]  for i in sorted(distances)]
    # 取前K個值的數(shù)據(jù)
    top_nearest = sorted_distances[:k]
    # 返回最多分類
    group_res = Counter(top_nearest).most_common(1)[0][0]

    return group_res

def file2Mat(testFileName, parammterNumber):
    fr = open(testFileName)
    lines = fr.readlines()
    lineNums = len(lines)
    resultMat = np.zeros((lineNums, parammterNumber))
    classLabelVector = []
    for i in range(lineNums):
        line = lines[i].strip()
        itemMat = line.split('\t')
        resultMat[i, :] = itemMat[0:parammterNumber]
        classLabelVector.append(itemMat[-1])
    fr.close()
    return resultMat, classLabelVector;

# 為了防止某個屬性對結(jié)果產(chǎn)生很大的影響侍芝，所以有了這個優(yōu)化研铆，比如:10000,4.5,6.8 10000就對結(jié)果基本起了決定作用
def autoNorm(dataSet):
    minVals = dataSet.min(0)
    maxVals = dataSet.max(0)
    ranges = maxVals - minVals
    normMat = np.zeros(np.shape(dataSet))
    size = normMat.shape[0]
    normMat = dataSet - np.tile(minVals, (size, 1))
    normMat = normMat / np.tile(ranges, (size, 1))
    return normMat, minVals, ranges

if __name__=='__main__':

    trainigSetFileName=  'data\\datingTrainingSet.txt'
    testFileName = 'data\\datingTestSet.txt'

    # 讀取訓(xùn)練數(shù)據(jù)
    trianingMat, classLabel = file2Mat(trainigSetFileName, 3)
    # 都數(shù)據(jù)進行歸一化的處理
    trianingMat, minVals, ranges = autoNorm(trianingMat)
    # 讀取測試數(shù)據(jù)
    testMat, testLabel = file2Mat(testFileName, 3)

    correct = 0
    total = 0
    testSize = testMat.shape[0]
    print(testSize)
    print(testMat.shape[1])
    testSize = testMat.shape[0]
    for i in range(testSize):
        predict = testMat[i]
        # 你可以調(diào)整這個k看看準確率的變化，你也可以使用matplotlib畫出k對應(yīng)的準確率州叠，找到最好的k值
        res = k_nearest_neighbors(trianingMat, (predict - minVals) / ranges, classLabel, k = 5)
        if testLabel[i] == res:
            correct += 1
        total += 1

    print(correct)  # 準確率
    print(correct/(testSize*1.0))  # 準確率

    print(k_nearest_neighbors(trianingMat, [4,2,1], classLabel, k = 5)) # 預(yù)測一條記錄

三棵红、測試數(shù)據(jù) `

最后編輯于：2017.12.09 01:49:46

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市咧栗，隨后出現(xiàn)的幾起案子逆甜，更是在濱河造成了極大的恐慌，老刑警劉巖致板，帶你破解...
沈念sama閱讀 218,755評論 6贊 507
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件交煞，死亡現(xiàn)場離奇詭異，居然都是意外死亡斟或，警方通過查閱死者的電腦和手機素征，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,305評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來萝挤，“玉大人御毅，你說我怎么就攤上這事×洌” “怎么了端蛆？”我有些...
開封第一講書人閱讀 165,138評論 0贊 355
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長绘面。經(jīng)常有香客問我欺税，道長侈沪，這世上最難降的妖魔是什么揭璃？我笑而不...
開封第一講書人閱讀 58,791評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮亭罪，結(jié)果婚禮上瘦馍，老公的妹妹穿的比我還像新娘。我一直安慰自己应役，他們只是感情好情组，可當(dāng)我...
茶點故事閱讀 67,794評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著箩祥，像睡著了一般院崇。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上袍祖，一...
開封第一講書人閱讀 51,631評論 1贊 305
城市分裂傳說
那天底瓣，我揣著相機與錄音，去河邊找鬼蕉陋。笑死捐凭，一個胖子當(dāng)著我的面吹牛拨扶，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播茁肠，決...
沈念sama閱讀 40,362評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼患民，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了垦梆？” 一聲冷哼從身側(cè)響起匹颤，我...
開封第一講書人閱讀 39,264評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎托猩，沒想到半個月后惋嚎，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,724評論 1贊 315
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡站刑，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,900評論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年另伍，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片绞旅。...
茶點故事閱讀 40,040評論 1贊 350
活死人
序言：一個原本活蹦亂跳的男人離奇死亡摆尝，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出因悲，到底是詐尸還是另有隱情堕汞，我是刑警寧澤，帶...
沈念sama閱讀 35,742評論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布晃琳，位于F島的核電站讯检，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏卫旱。R本人自食惡果不足惜人灼，卻給世界環(huán)境...
茶點故事閱讀 41,364評論 3贊 330
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望顾翼。院中可真熱鬧投放，春花似錦、人聲如沸适贸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,944評論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽拜姿。三九已至烙样，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間蕊肥，已是汗流浹背谒获。一陣腳步聲響...
開封第一講書人閱讀 33,060評論 1贊 270
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人究反。一個月前我還...
沈念sama閱讀 48,247評論 3贊 371
代替公主和親
正文我出身青樓寻定，卻偏偏與公主長得像，于是被迫代替她去往敵國和親精耐。傳聞我的和親對象是個殘疾皇子狼速，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 44,979評論 2贊 355