Search⌘ K
AI Features

Overview of spaCy Conventions

Explore spaCy's conventions to understand its text processing pipeline and core components like tokens, Doc, and Vocab. Learn how spaCy handles tokenization, tagging, parsing, and entities through an efficient pipeline to simplify NLP development.

We'll cover the following...

Overview of spaCy

Every NLP application consists of several steps of processing the text. As we saw previously, we have always created instances called nlp and doc. But what did we do exactly?

When we call nlp on our text, spaCy applies some processing steps. The first step is tokenization to produce a Doc object. The Doc object is then processed further with a tagger, a parser, and an entity recognizer. This way of processing the text is called a language processing pipeline. Each pipeline component returns the processed Doc and then passes it to the next component:

A high-level overview of the processing pipeline
A high-level overview of the processing pipeline

A spaCy pipeline object is created when we load a ...