Training the spaCy text classifier
In this section, we will learn about the details of spaCy's text classifier component TextCategorizer
. In Chapter 2, Core Operations with spaCy, we saw that the spaCy NLP pipeline consists of components. In Chapter 3, Linguistic Features, we learned about the essential components of the spaCy NLP pipeline, which are the sentence tokenizer, POS tagger, dependency parser, and named entity recogition (NER).
TextCategorizer
is an optional and trainable pipeline component. In order to train it, we need to provide examples and their class labels. We first add TextCategorizer
to the NLP pipeline and then do the training procedure. Figure 8.2 shows where exactly the TextCategorizer
component lies in the NLP pipeline; this component comes after the essential components. In the following diagram, textcat refers to the TextCategorizer
component.
A neural network architecture...