Congratulations for sticking with us until the end! Together we have learned how Naïve Bayes works and why it is not that naïve at all. Especially for training sets, where we don't have enough data to learn all the niches in the class-probability space, Naïve Bayes does a great job of generalizing. We learned how to apply it to tweets and that cleaning the rough tweets' texts helps a lot. Finally, we realized that a bit of cheating (only after we have done our fair share of work) is okay. However, since we realized that the much costlier classifier did not reward us with a much-improved classifier, we went back to the cheaper classifier.
In Chapter 10, Topic Modeling, we will learn how we can extract topics from a document using Latent Dirichlet allocation, also called topic modeling. This will help us to compare documents by analyzing how similar...