The Naïve Bayes classifier
The name Naïve Bayes comes from the basic assumption in the model that the probability of a particular feature is independent of any other feature
given the class label
. This implies the following:
data:image/s3,"s3://crabby-images/c5b80/c5b809fefeb8f9137d21e808684899cd3f60863d" alt=""
Using this assumption and the Bayes rule, one can show that the probability of class , given features
, is given by:
data:image/s3,"s3://crabby-images/5c8b2/5c8b242cf0d1339ef780ade77b3234108895c205" alt=""
Here, is the normalization term obtained by summing the numerator on all the values of k. It is also called Bayesian evidence or partition function Z. The classifier selects a class label as the target class that maximizes the posterior class probability
:
data:image/s3,"s3://crabby-images/86e4d/86e4da77c7f2eac2b2f20da23ac5bda17a2ffd63" alt=""
The Naïve Bayes classifier is a baseline classifier for document classification. One reason for this is that the underlying assumption that each feature (words or m-grams) is independent of others, given the class label typically holds good for text. Another reason is that the Naïve Bayes classifier scales well when there is a large number of documents.
There are two implementations of Naïve Bayes. In...