Classifying Texts
In this chapter, we will be classifying texts using different methods. Classifying texts is a classic NLP problem. This NLP task involves assigning a value to a text, for example, a topic (such as sport or business) or a sentiment, such as negative or positive, and any such task needs evaluation.
After reading this chapter, you will be able to preprocess and classify texts using keywords, unsupervised clustering, and two supervised algorithms: support vector machines (SVMs) and a convolutional neural network (CNN) model trained within the spaCy framework. We will also use GPT-3.5 to classify texts.
For theoretical background on some of the concepts discussed in this section, please refer to Building Machine Learning Systems with Python by Coelho et al. That book will explain the basics of building a machine learning project, such as training and test sets, as well as metrics used to evaluate such projects, including precision, recall, F1, and accuracy.
Here...