Annotating and preparing data
The first step of training a model is always preparing training data. You usually collect data from customer logs and then turn them into a dataset by dumping the data as a CSV file or a JSON file. spaCy model training code works with JSON files, so we will be working with JSON files in this chapter.
After collecting our data, we annotate our data. Annotation means labeling the intent, entities, POS tags, and so on.
This is an example of annotated data:
{ "sentence": "I visited JFK Airport." "entities": { "label": "LOC" "value": "JFK Airport" }
As you see, we point the statistical algorithm to what we want the model to learn. In this example, we want the model to learn about the entities, hence, we feed examples with entities annotated...