Text Classification

Learn to perform text classification tasks using the Hugging Face Inference API.

Text classification can be used to infer the type of the given text. For example, it can help determine if a book is a success based on the reviews, determine whether the reviews are positive or negative, determine an author's tone in a passage, or verify whether a sentence or passage is grammatically correct.

Press + to interact

Sentiment analysis

Have you ever wondered how companies like Amazon know if a certain product is a success or a flop based on customer reviews? Thanks to NLP, we can perform sentiment analysis. In sentiment analysis, we take a sentence and infer if it's positive, negative, or neutral.

The distilbert-base-uncased-finetuned-sst-2-english model is recommended for text classification. However, there are many models available for this task, and some common models are below:

Models for Text Classification

Model

Description

distilbert-base-uncased-finetuned-sst-2-english

Developed by the Hugging Face team for text classification. Base model is distilbert-base-uncased and is fine-tuned on the sst2 dataset. Works well for the English language.

cardiffnlp/twitter-roberta-base-sentiment

Trained on around 58 million tweets for sentiment analysis. Labels will be predicted from "Negative," "Positive," and "Neutral." Works well for the English language.

ProsusAI/finbert

This model is for financial text classification. The model outputs "positive," "neutral," and "negative" labels.

cardiffnlp/twitter-roberta-base-emotion

Trained on around 58 million tweets for sentiment analysis and fine-tuned for emotional text analysis. Outputs labels like "anger," "joy," "sadness," and others.

finiteautomata/bertweet-base-sentiment-analysis

Trained on around 40 thousand tweets for sentiment analysis. The model outputs "POS," "NEG," and "NEU."

We can call the following endpoint via the POST request method for text classification by replacing the path parameter {model} with any model mentioned above:

https://api-inference.huggingface.co/models/{model}

Request parameters

The endpoint above takes the following request parameters:

Parameter

Type

Category

Description

inputs

String

Required

A string or string [ ] to be classified

options.use_cache

Boolean

Optional

Hugging Face Inference API has a cache mechanism implemented to speed up the requests. Use it for the deterministic models. Default value is true.

options.wait_for_model

Boolean

Optional

Hugging Face Inference API models takes time to initiate and process the requests. If the value is true, it waits for the model to get ready instead of returning an error. Default value is false.

The code below gives an example of text classification. Below, we apply it to one of the most iconic opening lines from Herman Melville's classic, Moby-Dick (1851):

Press + to interact
// Endpoint URL
const endpointUrl = "https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
};
// Input text to classify
const data = JSON.stringify({
inputs: "Call me Ishmael. Some years ago—never \
mind how long precisely—having little or no money in my purse, \
and nothing particular to interest me on shore, I thought I would \
sail about a little and see the watery part of the world. It is a \
way I have of driving off the spleen, and regulating the circulation.",
options: {
wait_for_model: true
}
});
const options = {
method: "POST",
headers: headerParameters,
body: data
};
async function textClassification() {
try {
const response = await fetch(endpointUrl, options);
printResponse(response);
} catch (error) {
printError(error);
}
}
textClassification();

Let’s have a look at the highlighted lines shown in the code widget above:

  • Line 2: We specify the endpoint URL with the distilbert-base-uncased-finetuned-sst-2-english model.

  • Lines 8–17: We provide the input text for the classification and set an additional options.wait_for_model parameter to true.

  • Lines 25–32: We create a function, textClassification, to call the API and handle exceptions.

  • Line 34: We call the textClassification function to invoke the endpoint.

Response fields

This API call returns a dictionary object or list of dictionary objects containing possible labels ordered by the likelihood scores of the text. The labels are model specific, and if we use other models that can do classification, they return labels other than "Positive" and "Negative."

Parameter

Type

Description

label

String

Specifies predicted label of the text

score

Float

Specifies likelihood score of the text

Try the following examples in the widget above by changing the inputs in lines 9–13.

Press + to interact
inputs: "Dickens is one of those writers who are \
well worth stealing. Even the burial of his body in Westminster \
Abbey was a species of theft, if you come to think of it"

The following is an example of classifying multiple texts in one API call.

Press + to interact
inputs: ["To be, or not to be",
"With mirth and laughter let old wrinkles come"]

Zero-shot classification

Zero-shot classification is an ML technique in which we provide the model with labels that were not seen during the training. It can infer the unseen labels from the data on which it's trained by learning the concepts. When we want to classify a short text or large text with respect to the user-provided labels, the recommended model for this task is facebook/bart-large-mnli. There are many models available for this task, and some common models are below:

Models for Zero-Shot Classification

Model

Description

facebook/bart-large-mnli

Trained on the mnli dataset and based on the model bart-large. Utilizes the pretrained models of NLI for zero-shot classification.

joeddav/xlm-roberta-large-xnli

Trained on the dataset of 15 languages and based on the xlm-roberta-large. Base model is trained on 100 languages, so this model has some tendency towards other languages too. Used for zero-shot classification.

valhalla/distilbart-mnli-12-1

Trained by taking the alternative layers of the bart-large-mnli model on the same type of dataset

typeform/distilbert-base-uncased-mnli

Based on distilbert-base-uncased and fine-tuned on the mnli dataset. This model is for the English language.

We can call the following endpoint via the POST request method for the text classification by replacing the path parameter {model} with any model mentioned above:

https://api-inference.huggingface.co/models/{model}

Request parameters

The API call above takes the following request parameters:

Parameter

Type

Category

Description

inputs

String

Required

A string or string [ ] to be classified

parameters.candidate_labels

String [ ]

Required

A list of candidate labels for inputs. Maximum candidate_labels allowed is 10.

parameters.multi_label

Boolean

Optional

Default value is false. If we want more than 10 candidate_labels, set it true.

options.use_cache

Boolean

Optional

Hugging Face Inference API has a cache mechanism implemented to speed up the requests. Use it for the deterministic models. Default value is true.

options.wait_for_model

Boolean

Optional

Hugging Face Inference API models takes time to initiate and process the requests. If the value is true, it waits for the model to get ready instead of returning an error. Default value is false.

The code below classifies the text for the provided candidate_labels. We have provided the candidate_labels on line 13.

Press + to interact
// Endpoint URL
const endpointUrl = "https://api-inference.huggingface.co/models/facebook/bart-large-mnli";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
};
// Input text to classify
const data = JSON.stringify({
inputs: "Machine Learning skills are some of the most sought-after in the modern \
job market. Modern ML Engineers make dozens of thousands of dollars more per year \
than other developers. ",
parameters:{
candidate_labels:["Software Engineer", "Machine Learning Engineer", "Electrical Engineer"]
},
options: {
wait_for_model: true
}
});
const options = {
method: "POST",
headers: headerParameters,
body: data
};
async function textClassification() {
try {
const response = await fetch(endpointUrl, options);
printResponse(response);
} catch (error) {
printError(error);
}
}
textClassification();

Response fields

This API call returns a dictionary object or list of dictionary objects containing possible labels ordered by the likelihood scores of the input texts and the text provided for the classification.

Parameter

Type

Description

sequence

String

Specifies the input text whose label is predicted

labels

String [ ]

An array of the labels that we have provided for the zero-shot classification

scores

Float [ ]

An array of the likelihood scores corresponding to the labels array