In this chapter, we will demonstrate how to use various Natural Language Processing (NLP) APIs to perform text classification. This is not to be confused with text clustering. Clustering is concerned with the identification of text without the use of predefined categories. Classification, in contrast, uses predefined categories. In this chapter, we will focus on text classification, where tags are assigned to text to specify its type.
The general approach that is used to perform text classification starts with the training of a model. The model is validated and then used to classify documents. We will focus on the training and usage stages of this process.
Documents can be classified according to any number of attributes, such as their subject, document type, time of publication, author, language used, and reading level. Some classification approaches...