Introduction to text classification
Text classification (also known as text categorization) is a way of mapping a document (sentence, Twitter post, book chapter, email content, and so on) to a category out of a predefined list (classes). In the case of two classes that have positive and negative labels, we call this binary classification – more specifically, sentiment analysis. For more than two classes, we call this multi-class classification, where the classes are mutually exclusive, or multi-label classification, where the classes are not mutually exclusive, which means a document can receive more than one label. For instance, the content of a news article may be related to sport and politics at the same time. Beyond this classification, we may want to score the documents in a range of [-1,1] or rank them in a range of [1-5]. We can solve this kind of problem with a regression model, where the type of the output is numeric, not categorical.
Luckily, the transformer architecture...