Token Classification

Learn to perform token classification tasks using the Hugging Face Inference API.

It can be difficult to understand natural languages. We're required to perform some preprocessing before inputting the data into an NLP model. Tokenization allows us to demarcate parts of a sentence.

Hugging Face also allows us to perform classification on these tokens. There are a couple of popular subtasks:

  • Named entity recognition (NER)

  • Part-of-speech (POS) tagging

Press + to interact

Named entity recognition (NER)

In named entity recognition (NER), also known as entity identification, the classifier returns the key information (entities). The dbmdz/bert-large-cased-finetuned-conll03-english model is recommended for NER tasks. There are many models available for this task, and some common models are below:

Models for NER

Model

Description

dbmdz/bert-large-cased-finetuned-conll03-english

Based on bert-large-cased and trained on the CoNLL-2003 Named Entity Recognition dataset for the NER tasks. Identifies four entities: LOC, ORG, MISC, and PER.

Jean-Baptiste/camembert-ner

Based on camemBERT and trained on the wikiner-fr dataset for the NER task. Classifies O, MISC, PER, LOC, and ORG.

dslim/bert-base-NER

Trained on the BERT model and fine-tuned using the CoNLL-2003 Named Entity Recognition dataset. Classifies MISC, PER, LOC, and ORG.

Davlan/bert-base-multilingual-cased-ner-hrl

Based on bert-base-multilingual-cased and trained on the 10 languages for the NER tasks. Identifies three entities: LOC, ORG, and PER.

In the table above, "O" is the outside name entity, "MISC" is the miscellaneous entity, "PER" is the person's name, "ORG" is the organization, and "LOC" is the location.

We can call the following endpoint via the POST request method for NER tasks by replacing the path parameter {model} with any model mentioned above:

https://api-inference.huggingface.co/models/{model}

Request parameters

The request parameters for this API call are as follows:

Parameter

Type

Category

Description

inputs

String

Required

Specifies a string or string [ ] to be classified

options.use_cache

Boolean

Optional

Hugging Face Inference API has a cache mechanism implemented to speed up the requests. Use it for the deterministic models. Default value is true.

options.wait_for_model

Boolean

Optional

Hugging Face Inference API models takes time to initiate and process the requests. If the value is true, it waits for the model to get ready instead of returning an error. Default value is false.

The code below is an example of NER.

Press + to interact
// Endpoint URL
const endpointUrl = "https://api-inference.huggingface.co/models/dbmdz/bert-large-cased-finetuned-conll03-english";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
};
// Input text to classify
const data = JSON.stringify({
inputs: "Ptolemy mentions in his Geographia a city called Labokla which \
may have been in reference to ancient Lahore.",
options: {
wait_for_model: true
}
});
const options = {
method: "POST",
headers: headerParameters,
body: data
};
async function tokenClassification() {
try {
const response = await fetch(endpointUrl, options);
printResponse(response);
} catch (error) {
printError(error);
}
}
tokenClassification();

Let’s have a look at the highlighted lines shown in the code widget above:

  • Line 2: We specify the endpoint URL with the dbmdz/bert-large-cased-finetuned-conll03-english model for the NER task.

  • Lines 9–10: We provide input text for the text (NER) classification.

  • Lines 22–29: We create a function, tokenClassification, to call the API and handle exceptions.

  • Line 31: We call the tokenClassification function to invoke the endpoint.

Response fields

The API call above returns a dictionary object or a list of dictionary objects, depending on the inputs. The response contains the following fields.

Parameter

Type

Description

entity_group

String

Specifies the type of the recognized entity

score

Float

Specifies the likelihood of the entity

word

String

Specifies the string whose entity group is to be recognized

start

Integer

The starting index of the string. Helpful in case of multiple occurrences of the string.

end

Integer

The ending index of the string. Helpful in case of multiple occurrences of the string.

Part-of-speech (POS) tagging

An NLP model can be facilitated by classifying tokens into the respective parts of speech. We can use POS tagging for this task. The vblagoje/bert-english-uncased-finetuned-pos model is recommended for POS tagging tasks. However, there are many models available for this task, and some common models are below:

Models for POS Tagging

Model

Description

vblagoje/bert-english-uncased-finetuned-pos

Based on bert and fine-tuned on the POS dataset. Used for the POS tagging tasks. Is not case sensitive.

batterydata/bde-pos-bert-cased-base

Based on bert and fine-tuned on the POS dataset. A case-sensitive model.

We can call the following endpoint via the POST request method for POS tagging by replacing the path parameter {model} with any model mentioned above:

https://api-inference.huggingface.co/models/{model}

Request parameters

The request parameters of this API call are the same as for the previous API call. The code below is an example of POS tagging:

Press + to interact
// Endpoint URL
const endpointUrl = "https://api-inference.huggingface.co/models/vblagoje/bert-english-uncased-finetuned-pos";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
};
// Input text to classify
const data = JSON.stringify({
inputs: "This chapter will provide an overview of performing common NLP tasks.",
options: {
wait_for_model: true
}
});
const options = {
method: "POST",
headers: headerParameters,
body: data
};
async function posTagging() {
try {
const response = await fetch(endpointUrl, options);
printResponse(response);
} catch (error) {
printError(error);
}
}
posTagging();

Let’s have a look at the highlighted lines shown in the code widget above:

  • Line 2: We specify the endpoint URL with the vblagoje/bert-english-uncased-finetuned-pos model for the POS tagging task.

  • Line 9: We provide input text for the POS tagging task.

  • Lines 21–28: We create a function, posTagging, to call the API and handle exceptions.

  • Line 30: We call the posTagging function to invoke the endpoint.

Response fields

The response fields of this API call are the same as for the previous API call.