1.樸素貝葉斯原理推導
2.常見模型
不同的樸素貝葉斯分類器的區(qū)別主要在于它們對分布的假設不同肩杈。盡管它們的假設顯然過于簡化盆佣,但naive Bayes分類器在許多實際情況下都能很好地工作,比如著名的文檔分類和垃圾郵件過濾痹雅。它們需要少量的訓練數(shù)據(jù)來估計必要的參數(shù)狂秘。(由于理論上的原因咱娶,naive Bayes工作得很好,以及它工作的數(shù)據(jù)類型滩援,請參閱下面的參考資料栅隐。)
Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods. The decoupling of the class conditional feature distributions means that each distribution can be independently estimated as a one dimensional distribution. This in turn helps to alleviate problems stemming from the curse of dimensionality.
On the flip side, although naive Bayes is known as a decent classifier, it is known to be a bad estimator, so the probability outputs from?predict_proba?are not to be taken too seriously.
(1)Gaussian Naive Bayes
GaussianNB?implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian:
The parameters? and?
are estimated using maximum likelihood.
(2)Multinomial Naive Bayes
MultinomialNB?implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). The distribution is parametrized by vectors? for each class
, where?
is the number of features (in text classification, the size of the vocabulary) and?
is the probability?
of feature?
appearing in a sample belonging to class
.
The parameters? is estimated by a smoothed version of maximum likelihood, i.e. relative frequency counting:
where? is the number of times feature?
appears in a sample of class?
in the training set
, and?
is the total count of all features for class
.
The smoothing priors? accounts for features not present in the learning samples and prevents zero probabilities in further computations. Setting?
is called Laplace smoothing, while?
is called Lidstone smoothing.
(3)?Bernoulli Naive Bayes
BernoulliNB?implements the naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions; i.e., there may be multiple features but each one is assumed to be a binary-valued (Bernoulli, boolean) variable. Therefore, this class requires samples to be represented as binary-valued feature vectors; if handed any other kind of data, a?BernoulliNB?instance may binarize its input (depending on the?binarize?parameter).
The decision rule for Bernoulli naive Bayes is based on
which differs from multinomial NB’s rule in that it explicitly penalizes the non-occurrence of a feature?i?that is an indicator for class?y, where the multinomial variant would simply ignore a non-occurring feature.
In the case of text classification, word occurrence vectors (rather than word count vectors) may be used to train and use this classifier.?BernoulliNB?might perform better on some datasets, especially those with shorter documents. It is advisable to evaluate both models, if time permits.
注:原理推導源自:https://blog.csdn.net/u012162613/article/details/48323777
常見模型源自:?https://scikit-learn.org/stable/modules/naive_bayes.html