攜程筆試的時(shí)候碰到了這個(gè)題目,當(dāng)時(shí)其實(shí)沒(méi)多想走搁。貝葉斯這個(gè)路子怕也太過(guò)氣了吧... 攜程也真是...
回顧思路
- 計(jì)算先驗(yàn)概率
- 計(jì)算條件概率
- 不同類(lèi)別概率估計(jì)
原始數(shù)據(jù)集
代碼
加載數(shù)據(jù)集
import numpy as np
def loadDataSet():
postingList=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'],
['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'],
['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'],
['stop', 'posting', 'stupid', 'worthless', 'garbage'],
['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'],
['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']]
classVec = [0,1,0,1,0,1] #1 is abusive, 0 not
return postingList,classVec
這里類(lèi)別為兩類(lèi),1-惡意留言唱凯;0-非惡意留言尔许。
vocab
def getVocabList(dataSet):
vocab = {}
vocab_reverse = {}
index = 0
for line in dataSet:
for word in line:
if word not in vocab:
vocab[word] = index
vocab_reverse[index] = word
index += 1
return vocab,vocab_reverse
先驗(yàn)概率與條件概率
def native_bayes(vocab,postingList,classVec):
# 先驗(yàn)概率
label = [0,1]
label_num = len(label)
vocab_len = len(vocab)
prior_probability = np.ones(label_num) # 初始化先驗(yàn)概率
conditional_probability = np.ones((label_num,vocab_len)) # 初始化條件概率
postingList_ids = [[vocab[word] for word in line]for line in postingList]
# 默認(rèn)N為2,
p_n = np.array([2,2])
for i in range(len(postingList_ids)):
for word in postingList_ids[i]:
conditional_probability[classVec[i]][word]+=1
p_n[classVec[i]] += 1
# 條件概率
conditional_probability[0] /= p_n[0]
conditional_probability[1] /= p_n[1]
# 先驗(yàn)概率
all_N = sum(p_n)
p_n = p_n/all_N
return p_n,conditional_probability
argmax 判斷
def judge(testEntry):
postingList,classVec = loadDataSet()
vocab,vocab_reverse = getVocabList(postingList)
p_n,conditional_probability = native_bayes(vocab,postingList,classVec)
Ans_p = p_n
testEntry_ids = [vocab[word] for word in testEntry]
for num in testEntry_ids:
Ans_p[0] *= conditional_probability[0][num]
Ans_p[1] *= conditional_probability[1][num]
return np.argmax(Ans_p)
調(diào)用
judge(testEntry = ['stupid', 'garbage'])
輸出 1
,和我們預(yù)期的一樣诀浪。