Text Similarity

Learn to identify sentence similarity using the Hugging Face Inference API.

Sentence similarity is used to check how similar (or dissimilar) two sentences or passages are. From plagiarism checking to information retrieval, sentence similarity has a lot of uses.

Find text similarity using the API

The sentence-transformers/all-MiniLM-L6-v2 model is recommended for text similarity tasks. However, there are many models available for this task, and some common models are below:

Models for Text Similarity

Model

Description

sentence-transformers/all-MiniLM-L6-v2

Based on nreimers/MiniLM-L6-H384-uncased and fine-tuned on multiple datasets. Finds the semantic similarity between the input sentences.

sentence-transformers/all-mpnet-base-v2

Based on microsoft/mpnet-base and trained on multiple datasets, which contain around one billion sentences. Also used for semantic search.

sentence-transformers/multi-qa-MiniLM-L6-cos-v1

Based on nreimers/MiniLM-L6-H384-uncased and fine-tuned on multiple datasets. Finds the semantic similarity between the input sentences.

We can call the following endpoint via the POST request method for the text similarity tasks by replacing the path parameter {model} with any model mentioned above:

https://api-inference.huggingface.co/models/{model}

Request parameters

The request parameters for this API call are as follows:

Parameter

Type

Category

Description

inputs.source_sentence

String

Required

Specifies the source sentence/passage to compare with other inputs.sentences

inputs.sentences

String

Required

Specifies the list of sentences that will be compared with source_sentence

options.use_cache

Boolean

Optional

Hugging Face Inference API has a cache mechanism implemented to speed up the requests. Use it for the deterministic models. Default value is true.

options.wait_for_model

Boolean

Optional

Hugging Face Inference API models takes time to initiate and process the requests. If the value is true, it waits for the model to get ready instead of returning an error. Default value is false.

The following code checks for the similarity of source_sentence with sentences.

Press + to interact
// Endpoint URL
const endpointUrl = "https://api-inference.huggingface.co/models/sentence-transformers/all-MiniLM-L6-v2";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
};
// Input text to classify
const data = JSON.stringify({
inputs: {
"source_sentence": "What's up with the weather?",
"sentences": ["Global warming's effects are getting serious.", "Good morning, be positive and good things will happen."]
},
options: {
wait_for_model: true
}
});
const options = {
method: "POST",
headers: headerParameters,
body: data
};
async function textSimilarity() {
try {
const response = await fetch(endpointUrl, options);
printResponse(response);
} catch (error) {
printError(error);
}
}
textSimilarity();

Let’s have a look at the highlighted lines shown in the code widget above:

  • Line 2: We specify the sentence-transformers/all-MiniLM-L6-v2 model for text similarity.

  • Lines 10–11: We set the source_sentence to compare it with the sentences.

  • Lines 24–31: We create a function, textSimilarity, to make the API call and handle the exceptions.

  • Line 33: We call the textSimilarity function to invoke the endpoint.

Response fields

The API call above returns a list of scores depending on the inputs, and these are the semantic similarity scores of the source_sentence with each sentence in the sentences list. We have two sentences to find the similarity with source_sentence, and the score of the first sentence will be higher because it talks about the weather conditions, which are related to the source_sentence.

Example

Try out the following example in the widget above, and observe which sentence has a high score:

Press + to interact
inputs: {
"source_sentence": "What is the fastest thing in the universe?",
"sentences": ["Cheetah maybe.",
"I think it's light.",
"My thoughts?."]
}