Image Classification

Learn to perform image classification using the Hugging Face Inference API.

Hugging Face is not only restricted to NLP tasks. It offers much more by providing unbelievable results for CV tasks like image classification, object detection, and image segmentation. There are a number of applications for image classification, which is the process of categorizing or labeling an image based on a set of rules. For example, we can use it to find all the images from our photo album containing cats. Many photo albums use it for automatic tagging. Hugging Face provides an API that uses its models for CV tasks.

Classify images using the API

The google/vit-base-patch16-224 model is recommended for the image classification task. However, there are many models available for this task, and some common models are below:

Models for Image Classification

Model

Description

google/vit-base-patch16-224

A bert like model that works for images. Initially trained on the ImageNet-21k dataset with around 14 million images and 21,843 classes. Fine-tuned on the ImageNet 2012 dataset having around one million images and 1,000 classes.

microsoft/beit-base-patch16-224-pt22k-ft22k

Built on the same concept as google/vit, but in the first training, its purpose was to extract visual tokens from the masked areas.

microsoft/resnet-50

Based on the convolution neural network, which enables the model to learn the features in a deeper way. Trained on the ImageNet-1k dataset with image sizes of 224x224.

facebook/convnext-tiny-224

Based on the combination of a convolution neural network and inspiration from the transformers. Trained on the ImageNet-1k dataset with image sizes of 224x224.

We can call the following endpoint via the POST request method for the image classification by replacing the path parameter {model} with any model mentioned above:

https://api-inference.huggingface.co/models/{model}

Request parameters

This endpoint only takes the binary representation of an image file. In image classification, we usually assume the image to have a single object. Therefore, the model might return different results for images with multiple objects.

The code below classifies a cat image. We can take our input directly from the URL.

Press + to interact
// Endpoint URL
const endpointUrl = "https://api-inference.huggingface.co/models/google/vit-base-patch16-224";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
};
async function classifyImage(imgUrl) {
try {
// Reading image from URL
const imgResponse = await fetch(imgUrl);
const buffer = await imgResponse.buffer();
const options = {
method: "POST",
headers: headerParameters,
body: buffer
};
// Calling endpoint URL to classify the image
const response = await fetch(endpointUrl, options);
printResponse(response);
} catch (error) {
printError(error);
}
}
// URL of the image to be classified
let imgUrl = "https://images.unsplash.com/photo-1604675223954-b1aabd668078";
classifyImage(imgUrl);

Let’s briefly discuss the code shown in the widget above:

  • Line 2: We specify the google/vit-base-patch16-224 model for the image classification.

  • Lines 8–26: We create a function that receives the image URL as the input parameter.

    • Lines 11–12: We fetch the image from imgUrl and store it in the buffer.

    • Lines 14–18: We set the request method as POST, set headerParameters, and set body as the binary image we stored in the buffer.

    • Line 21: We call the fetch function to make an API call by passing the endpointUrl and options.

  • Line 29: We set imgUrl with a valid image URL.

  • Line 31: We call the classifyImage function to make an API call.

Response fields

This API call returns a dictionary object containing possible labels ordered by the likelihood scores of the image.

Parameter

Type

Description

score

Float

Specifies the likelihood score of the label

label

String

Specifies the predicted label of the image

Examples

In the previous example, the cat was in a portrait. That's no fun. Let's confuse the classifier a bit with these images:

//Example#1
let imgUrl = "https://images.unsplash.com/photo-1599889959407-598566c6e1f1?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=500&q=40"
//Example#2
let imgUrl = "https://images.unsplash.com/photo-1643251935745-4209d215f221?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1200&q=40"

Try out the images above by replacing the imgUrl at line 29 in the widget below, and observe the results.

Press + to interact
// Endpoint URL
const endpointUrl = "https://api-inference.huggingface.co/models/google/vit-base-patch16-224";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
};
async function classifyImage(imgUrl) {
try {
// Reading image from URL
const imgResponse = await fetch(imgUrl);
const buffer = await imgResponse.buffer();
const options = {
method: "POST",
headers: headerParameters,
body: buffer
};
// Calling endpoint URL to classify the image
const response = await fetch(endpointUrl, options);
printResponse(response);
} catch (error) {
printError(error);
}
}
// URL of the image to be classified
let imgUrl = "https://images.unsplash.com/photo-1599889959407-598566c6e1f1?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=500&q=40";
// Rendering html to show image
console.log(`<img src=${imgUrl} width="400px" height="500px">`);
classifyImage(imgUrl);