The Naïve Bayes classifiers reside in the sklearn.naive_bayes package. There are different kinds of Naïve Bayes classifiers:
- GaussianNB: This classifier assumes the features to be normally distributed (Gaussian). One use case for it could be the classification of sex given the height and weight of a person. In our case, we are given tweet texts from which we extract word counts. These are clearly not Gaussian-distributed.
- MultinomialNB: This classifier assumes the features to be occurrence counts, which is our case going forward, since we will be using word counts in the tweets as features. In practice, this classifier also works well with TF-IDF vectors.
- BernoulliNB: This classifier is similar to MultinomialNB, but more suited when using binary word occurrences and not word counts.
As we will mainly look at the word occurrences...