Token Based Matching

Explore how to apply spaCy's Matcher class for token-based pattern matching in natural language processing. Understand how to define readable and maintainable match rules using token attributes, avoiding the complexity of regular expressions. Learn to match sequences like greetings or sentence starters effectively, enhancing your rule-based NLP capabilities.

We'll cover the following...

So far, we've explored the sophisticated linguistic concepts that require statistical models and their usages with spaCy. Some NLU tasks can be solved in tricky ways without the help of any statistical model. One of those ways is regex, which we use to match a predefined set of patterns to our text.

A regex (a regular expression) is a sequence of characters that specifies a search pattern. A regex describes a set of strings that follows the specified pattern. A regex can include letters, digits, and characters with special meanings, such as ?, ., and *. Python's built-in library provides great support to define and match regular expressions. There's another Python 3 library called regex that aims to replace re in the future.

Readers who are actively developing NLP applications with Python have definitely come across regex code and, even better, have written regex themselves.

What does a regex look like, then? The following regex matches the following strings:

Barack Obama
Barack Obama
Barack Hussein Obama

This pattern can be read as: the string Barack can be followed optionally by the string Hussein (the ? character in regex means optional, that is, 0 or 1 occurrence) and should be followed by the string Obama. The inter-word spaces can be a single space character, a tab, or any other whitespace character ( \s matches all sorts of whitespace characters, including the newline character).

It's not very readable, even for such a short and uncomplicated pattern, is it? The following are the downsides of regex:

Difficult to read
Difficult to debug ...

1.Getting Started

2.Core Operations with spaCy

3.Linguistic Features

4.Rule-Based Matchmaking

5.Working with Word Vectors and Semantic Similarity

6.Putting Everything Together: Semantic Parsing with spaCy

Assessment

Project

7.Customizing spaCy Models

8.Text Classification with spaCy

9.spaCy and Transformers

10.Putting Everything Together: Designing a Chatbot with spaCy

11.Appendix

12.Conclusion

Assessment

Token Based Matching