Training a spaCy model for supervised text classification
In this recipe, we will train a spaCy model on the BBC dataset, the same dataset we used in the previous recipe, to will predict the text category.
Getting ready
We will use the spaCy package to train our model. All the dependencies are taken care of by the poetry
environment.
You will need to download the config file from the book’s GitHub repository, located at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/data/spacy_config.cfg. This file should be located at the path ../data/spacy_config.cfg
with respect to the notebook.
Note
You can modify the training config, or generate your own at https://spacy.io/usage/training.
The notebook is located at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/Chapter04/4.5-spacy_textcat.ipynb.
How to do it…
The general structure of the training...