Entity Extraction
Let's see how we will extract the entities that our chatbot will use.
We'll cover the following...
We'll now implement the first step of our chatbot NLU pipeline and extract entities from the dataset utterances. The following are the entities marked in our dataset:
citydatetimephone_numbercuisinerestaurant_namestreet_address
To extract the entities, we'll use the spaCy NER model and the spaCy Matcher class. Let's get started by extracting the city
entities.
Extracting city entities
We'll first extract the city
entities. We'll get started by recalling some information about the spaCy NER model and entity labels:
First, we recall that the spaCy named entity label for cities and countries is
GPE
. Let's ask spaCy to explain whatGPE
label corresponds to once again:
import spacynlp = spacy.load("en_core_web_md")print(spacy.explain("GPE"))
Secondly, we also recall that we can access entities of a
Doc
object via theents
property. We can find all entities in an utterance that are labeled by the spaCy NER model as follows:
import spacynlp = spacy.load("en_core_web_md")doc = nlp("Can you please confirm that you want to book a table for 2 at 11:30 am at the Bird restaurant in Palo Alto for today")print(doc.ents)for ent in doc.ents:print(ent.text, ent.label_)
In this code segment, we listed all named entities of this utterance by calling doc.ents
. Then, we examined the entity labels by calling ent.label_
. Examining the output, we see that this utterance contains five entities—one cardinal number entity (2
), one TIME
entity (11:30 am
), one PRODUCT
entity (Bird
, which is not an ideal label for a restaurant), one CITY
entity (Palo Alto
), and one DATE
entity (today
). The GPE
type entity is what we're looking for; Palo Alto
is a city in the US and hence is labeled by the spaCy NER model as GPE
.
The code below outputs all the utterances that include a city entity together with the city entities. From the output of this script, we can see that the spaCy NER model performs very well on this corpus ...