Entity extraction
In this section, we'll implement the first step of our chatbot NLU pipeline and extract entities from the dataset utterances. The following are the entities marked in our dataset:
city date time phone_number cuisine restaurant_name street_address
To extract the entities, we'll use the spaCy NER model and the spaCy Matcher
class. Let's get started by extracting the city
entities.
Extracting city entities
We'll first extract the city
entities. We'll get started by recalling some information about the spaCy NER model and entity labels from Chapter 3, Linguistic Features, and Chapter 6, Putting Everything Together: Semantic Parsing with spaCy:
- First, we recall that the spaCy named entity label for cities and countries is
GPE
. Let's ask spaCy to explain whatGPE
label corresponds to once again:import spacy nlp = spacy.load("en_core_web_md") spacy.explain("GPE") 'Countries, cities, states...