Text classification
Text classification is about assigning a topic, subject category, genre, or something similar to the text blob. For example, spam filters assign spam or not spam to an email.
Apache Spark supports various classifiers through MLlib and ML packages. The SVM classifier and Naive Bayes classifier are popular classifiers, and the former was already covered in the previous chapter. Let's take a look at the latter now.
Naive Bayes classifier
The Naive Bayes (NB) classifier is a multiclass probabilistic classifier and is one of the best classification algorithms. It assumes strong independence between every pair of features. It computes the conditional probability distribution of each feature and a given label, and then applies Bayes' theorem to compute the conditional probability of a label given an observation. In terms of document classification, an observation is a document to be classified into some class. Despite its strong assumptions on data, it is quite popular. It works...