PhraseMatcher and EntityRuler
Let's learn how spaCy help us with matching long dictionaries.
We'll cover the following...
PhraseMatcher
While processing financial, medical, or legal text, often we have long lists and dictionaries, and we want to scan the text against our lists. As we saw in the previous section, Matcher patterns are quite handcrafted; we coded each token individually. If we have a long list of phrases, Matcher is not very handy. It's not possible to code all the terms one by one.
spaCy offers a solution for comparing text against long dictionaries—the PhraseMatcher
class. The PhraseMatcher
class helps us match long dictionaries. Let's get started with an example:
Press + to interact
import spacyfrom spacy.matcher import PhraseMatchernlp = spacy.load("en_core_web_md")matcher = PhraseMatcher(nlp.vocab)terms = ["Angela Merkel", "Donald Trump", "Alexis Tsipras"]patterns = [nlp.make_doc(term) for term in terms]matcher.add("politiciansList", None, *patterns)doc = nlp("3 EU leaders met in Berlin. German chancellor Angela Merkel first welcomed the US president Donald Trump. The following day Alexis Tsipras joined them in Brandenburg.")matches = matcher(doc)for mid, start, end in matches:print(start, end, doc[start:end])
Here's what we did:
First, we imported
spacy
, then we imported thePhraseMatcher
class.After the imports, we ...