...

/

Training the spaCy Text Classifier

Training the spaCy Text Classifier

Let's learn about the details of spaCy's text classifier component.

In this section, we will learn about the details of spaCy's text classifier component TextCategorizer. Previously, we saw that the spaCy NLP pipeline consists of components. We also learned about the essential components of the spaCy NLP pipeline, which are the sentence tokenizer, POS tagger, dependency parser, and named entity recognition (NER).

TextCategorizer is an optional and trainable pipeline component. In order to train it, we need to provide examples and their class labels. We first add TextCategorizer to the NLP pipeline and then do the training procedure. The illustration below shows where exactly the TextCategorizer component lies in the NLP pipeline; this component comes after the essential components. In the following diagram, textcat refers to the TextCategorizer component.

Press + to interact
TextCategorizer in the nlp pipeline
TextCategorizer in the nlp pipeline

A neural network architecture lies behind spaCy's TextCategorizer. TextCategorizer provides us with user-friendly and end-to-end approaches to train the classifier, so we don't have to deal directly with the neural network architecture. We'll design our own neural network architecture in the upcoming chapters. After looking at the architecture, we’re ready to dive into TextCategorizer code. Let’s get to know the TextCategorizer class first.

Getting to know the TextCategorizer class

Now let's get to know the TextCategorizer class in detail. First of all, we import TextCategorizer from the pipeline components:

from spacy.pipeline.textcat import DEFAULT_SINGLE_TEXTCAT_MODEL
Importing the single label text categorizer

TextCategorizer is available in two flavors, single-label classifier and multilabel classifier. As we remarked previously, a multilabel classifier can predict more than one class. A single-label classifier predicts only one class for each example, and classes are mutually exclusive. The preceding import line imports the ...