Text classification is the process of classifying text documents into categories. Similar to the classification algorithms that we studied in Chapter 2, Classifying Twitter Feeds with Naive Bayes, to Chapter 6, Analyzing Visitor Patterns to Make Recommendations, text classification algorithms also generate models based on labeled training observations. The classification model can then be applied to any observation to predict its class. Moreover, the same algorithms that we studied in the previous chapters, such as Chapter 2, Classifying Twitter Feeds with Naive Bayes, Chapter 3, Predicting House Value with Regression Algorithms, and Chapter 4, Predicting User Behavior with Tree-Based Methods, can also be used for text classification.
Text data is unstructured data. Hence, we need to generate features from text documents so that those features...