0% completed

All LessonsFree Lessons (7)

Getting Started

Introduction to the Course Overview: Getting Started NLP and Python Python String Operations High-level Overview of the spaCy Library Visualization with displaCy Summary: Getting Started Quiz: Getting Started Exercise: Getting Started Solution: Getting Started

Core Operations with spaCy

Overview: Core Operations with spaCy

Overview of spaCy Conventions

Introducing Tokenization

Customizing the Tokenizer and Sentence Segmentation

Understanding Lemmatization

spaCy Container Objects

More spaCy Features

Summary: Core Operations with spaCy

Quiz: Core Operations with spaCy

Exercise: Core Operations with spaCy

Solution: Core Operations with spaCy

Linguistic Features

Overview: Linguistic Features POS Tagging POS Tagging - Continued Introduction to Dependency Parsing Introducing NER Merging and Splitting Tokens Summary: Linguistic Features Quiz: Linguistic Features Exercise: Linguistic Features Solution: Linguistic Features

Rule-Based Matchmaking

Overview: Rule-Based Matchmaking Token Based Matching Syntax Support and Regex PhraseMatcher and EntityRuler Combining spaCy Models and Matchers Summary: Rule-Based Matchmaking Quiz: Rule-Based Matchmaking Exercise: Rule-Based Matchmaking Solution: Rule-Based Matchmaking

Working with Word Vectors and Semantic Similarity

Overview: Word Vectors and Semantic Similarity Understanding Word Vectors Using spaCy's Pre-trained Vectors The Similarity Method Advanced Semantic Similarity Methods Summary: Word Vectors and Semantic Similarity Quiz: Working with Word Vectors and Semantic Similarity Exercise: Word Vectors and Semantic Similarity Solution: Word Vectors and Semantic Similarity

Putting Everything Together: Semantic Parsing with spaCy

Overview: Semantic Parsing with spaCy Extracting Named Entities Using Dependency Trees For Extracting Entities Using Dependency Relations for Intent Recognition Semantic Similarity Methods for Semantic Parsing Putting It All Together Summary: Semantic Parsing with spaCy Quiz: Semantic Parsing with spaCy Exercise: Semantic Parsing with spaCy Solution: Semantic Parsing with spaCy

Course Assessment

Assessment: spaCy Features

Project

Auto-Tagging System for Content Categorization

Customizing spaCy Models

Overview: Customizing spaCy Models Getting Started with Data Preparation Updating an Existing Pipeline Component Training a Pipeline Component From Scratch Summary Quiz: Customizing spaCy Models Exercise: Customizing spaCy Models Solution: Customizing spaCy Models

Text Classification with spaCy

Overview: Text Classification with spaCy Understanding the Basics of Text Classification Training the spaCy Text Classifier Sentiment Analysis with spaCy Text classification with spaCy and Keras Embedding Words Summary: Text Classification with SpaCy Quiz: Text Classification with spaCy Exercise: Text Classification with spaCy Solution: Text Classification with spaCy

spaCy and Transformers

Overview: spaCy and Transformers Transformers and Transfer Learning Understanding BERT Transformers and TensorFlow Using BERT for Text Classification Using Transformer Pipelines Transformers and spaCy Summary: spaCy and Transformers Quiz: spaCy and Transformers Exercise: spaCy and Transformers Solution: spaCy and Transformers

Putting Everything Together: Designing a Chatbot with spaCy

Overview: Designing a Chatbot with spaCy Introduction to Conversational AI Getting to Know the Dataset Entity Extraction Intent Recognition Classifying Text with a Character-level LSTM Differentiating Subjects from Objects Anaphora Resolution Summary: Designing a Chatbot with spaCy Quiz: Designing a Chatbot With spaCy

Appendix

Installing spaCy Installing spaCy's Statistical Models More on Visualization with displaCy Saving and loading custom models

Conclusion

Course Assessment

Assessment - Machine Learning with spaCy

Mastering spaCy/

...

/

Introducing Tokenization

Introducing Tokenization

Let's learn about tokenization.

We'll cover the following...

Tokenization is the first step in a text processing pipeline. It is always the first operation because all the other operations require the tokens.

Tokenization means splitting the sentence into its tokens. A token is a unit of semantics. You can think of a token as the smallest meaningful part of a piece of text. Tokens can be words, numbers, punctuation, currency symbols, and any other meaningful symbols that are the building blocks of a sentence. The following are examples of tokens:

Example tokens

USA	NY
city	33
3rd	!
...?	's

Tokenization in spaCy

Input to the spaCy tokenizer is a Unicode text, and the result is a Doc object. The following code shows the tokenization process:

Python 3.5

import spacy
nlp = spacy.load("en_core_web_md")
doc = nlp("I own a ginger cat.")
print ([token.text for token in doc])

The following is what we just did:

We start by importing spaCy. ...