樸素貝葉斯原理推導與常見模型

1.樸素貝葉斯原理推導



2.常見模型

不同的樸素貝葉斯分類器的區(qū)別主要在于它們對P(x_{i} |y_{k} )分布的假設不同肩杈。盡管它們的假設顯然過于簡化盆佣,但naive Bayes分類器在許多實際情況下都能很好地工作,比如著名的文檔分類和垃圾郵件過濾痹雅。它們需要少量的訓練數(shù)據(jù)來估計必要的參數(shù)狂秘。(由于理論上的原因咱娶,naive Bayes工作得很好,以及它工作的數(shù)據(jù)類型滩援,請參閱下面的參考資料栅隐。)

Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods. The decoupling of the class conditional feature distributions means that each distribution can be independently estimated as a one dimensional distribution. This in turn helps to alleviate problems stemming from the curse of dimensionality.

On the flip side, although naive Bayes is known as a decent classifier, it is known to be a bad estimator, so the probability outputs from?predict_proba?are not to be taken too seriously.

(1)Gaussian Naive Bayes

GaussianNB?implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian:


The parameters?σ_{y} and?μ_{y} are estimated using maximum likelihood.

(2)Multinomial Naive Bayes

MultinomialNB?implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). The distribution is parametrized by vectors?θy=(θ_{y1},…,θ_{yn}) for each class y, where?n is the number of features (in text classification, the size of the vocabulary) and?θ_{yi} is the probability?P(x_{i} |y )of feature?i appearing in a sample belonging to class y.

The parameters?θ_{y} is estimated by a smoothed version of maximum likelihood, i.e. relative frequency counting:


where?N_{yi}=∑_{x∈T}x_{i} is the number of times feature?i appears in a sample of class?y in the training set T, and?N_{y}=∑_{i=1}^n nN_{yi} is the total count of all features for class y.

The smoothing priors?α≥0 accounts for features not present in the learning samples and prevents zero probabilities in further computations. Setting?α=1 is called Laplace smoothing, while?α<1 is called Lidstone smoothing.

(3)?Bernoulli Naive Bayes

BernoulliNB?implements the naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions; i.e., there may be multiple features but each one is assumed to be a binary-valued (Bernoulli, boolean) variable. Therefore, this class requires samples to be represented as binary-valued feature vectors; if handed any other kind of data, a?BernoulliNB?instance may binarize its input (depending on the?binarize?parameter).

The decision rule for Bernoulli naive Bayes is based on


which differs from multinomial NB’s rule in that it explicitly penalizes the non-occurrence of a feature?i?that is an indicator for class?y, where the multinomial variant would simply ignore a non-occurring feature.

In the case of text classification, word occurrence vectors (rather than word count vectors) may be used to train and use this classifier.?BernoulliNB?might perform better on some datasets, especially those with shorter documents. It is advisable to evaluate both models, if time permits.

注:原理推導源自:https://blog.csdn.net/u012162613/article/details/48323777

常見模型源自:?https://scikit-learn.org/stable/modules/naive_bayes.html

最后編輯于
?著作權歸作者所有,轉載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市玩徊,隨后出現(xiàn)的幾起案子租悄,更是在濱河造成了極大的恐慌,老刑警劉巖佣赖,帶你破解...
    沈念sama閱讀 221,430評論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件恰矩,死亡現(xiàn)場離奇詭異,居然都是意外死亡憎蛤,警方通過查閱死者的電腦和手機外傅,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,406評論 3 398
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來俩檬,“玉大人萎胰,你說我怎么就攤上這事∨锪桑” “怎么了技竟?”我有些...
    開封第一講書人閱讀 167,834評論 0 360
  • 文/不壞的土叔 我叫張陵劣针,是天一觀的道長髓堪。 經(jīng)常有香客問我近刘,道長药有,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 59,543評論 1 296
  • 正文 為了忘掉前任搓扯,我火速辦了婚禮检痰,結果婚禮上,老公的妹妹穿的比我還像新娘锨推。我一直安慰自己铅歼,他們只是感情好,可當我...
    茶點故事閱讀 68,547評論 6 397
  • 文/花漫 我一把揭開白布换可。 她就那樣靜靜地躺著椎椰,像睡著了一般。 火紅的嫁衣襯著肌膚如雪沾鳄。 梳的紋絲不亂的頭發(fā)上慨飘,一...
    開封第一講書人閱讀 52,196評論 1 308
  • 那天,我揣著相機與錄音洞渔,去河邊找鬼套媚。 笑死,一個胖子當著我的面吹牛磁椒,可吹牛的內(nèi)容都是我干的堤瘤。 我是一名探鬼主播,決...
    沈念sama閱讀 40,776評論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼浆熔,長吁一口氣:“原來是場噩夢啊……” “哼本辐!你這毒婦竟也來了?” 一聲冷哼從身側響起医增,我...
    開封第一講書人閱讀 39,671評論 0 276
  • 序言:老撾萬榮一對情侶失蹤慎皱,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后叶骨,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體茫多,經(jīng)...
    沈念sama閱讀 46,221評論 1 320
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 38,303評論 3 340
  • 正文 我和宋清朗相戀三年忽刽,在試婚紗的時候發(fā)現(xiàn)自己被綠了天揖。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,444評論 1 352
  • 序言:一個原本活蹦亂跳的男人離奇死亡跪帝,死狀恐怖今膊,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情伞剑,我是刑警寧澤斑唬,帶...
    沈念sama閱讀 36,134評論 5 350
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響恕刘,放射性物質(zhì)發(fā)生泄漏缤谎。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 41,810評論 3 333
  • 文/蒙蒙 一雪营、第九天 我趴在偏房一處隱蔽的房頂上張望弓千。 院中可真熱鬧,春花似錦献起、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,285評論 0 24
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至呆抑,卻和暖如春岂嗓,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背鹊碍。 一陣腳步聲響...
    開封第一講書人閱讀 33,399評論 1 272
  • 我被黑心中介騙來泰國打工厌殉, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人侈咕。 一個月前我還...
    沈念sama閱讀 48,837評論 3 376
  • 正文 我出身青樓公罕,卻偏偏與公主長得像,于是被迫代替她去往敵國和親耀销。 傳聞我的和親對象是個殘疾皇子楼眷,可洞房花燭夜當晚...
    茶點故事閱讀 45,455評論 2 359

推薦閱讀更多精彩內(nèi)容