Object Detection

Learn how to use object detection using the Hugging Face Inference API.

Object detection is similar to image classification, but we’re interested in particular object instances. It can be used for object tracking and image annotation as well. This API call takes an image as input and returns the likelihood scores of the labels and the bounding boxes around the detected object as a result of object detection.

Press + to interact
An example of object detection
An example of object detection

Finding objects in an image using the API

The facebook/detr-resnet-50 model is recommended for the object detection task. However, there are many models available for this task, and some common models are below:

Models for Object Detection

Model

Description

facebook/detr-resnet-50

Built by the combination of a convolution neural network and transformer with two attention heads. One is for object detection, the other for the boundary detection of the object. Trained on 118 thousand labeled images.

hustvl/yolos-tiny

Trained on the COCO 2017 dataset. Contains around 118 thousand labeled images. Can be used for object detection.

We can call the following endpoint via the POST request method for object detection by replacing the path parameter {model} with any model mentioned above:

https://api-inference.huggingface.co/models/{model}

Request parameters

This endpoint takes only the binary representation of an image file.

The code below detects a cat in the image. We can take input directly from the URL.

Press + to interact
// Endpoint URL
const endpointUrl = "https://api-inference.huggingface.co/models/facebook/detr-resnet-50";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
};
async function detectObjects(imgUrl) {
try {
// Reading image from URL
const imgResponse = await fetch(imgUrl);
const buffer = await imgResponse.buffer();
const options = {
method: "POST",
headers: headerParameters,
body: buffer
};
// Calling endpoint URL for object detection
const response = await fetch(endpointUrl, options);
printResponse(response);
} catch (error) {
printError(error);
}
}
// URL of the image
let imgUrl = "https://images.unsplash.com/photo-1604675223954-b1aabd668078";
detectObjects(imgUrl);

We specify the endpoint URL with the facebook/detr-resnet-50 model at line 2.

Response fields

This API call returns a dictionary object containing possible labels ordered by the likelihood scores and the bounding boxes of the detected objects in the image.

Parameter

Type

Description

score

Float

Specifies the likelihood score of the label

label

String

Specifies the predicted label of the image

box

Object

Contains the bounding box coordinates of the detected object. Coordinates can be accessed by the names xmin, ymin, xmax, and ymax.

Examples

In the previous example, the cat was in a portrait. That's no fun. Let's confuse the model a bit with these images:

//Example#1
let imgUrl = "https://images.unsplash.com/photo-1599889959407-598566c6e1f1?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=500&q=40"
//Example#2
let imgUrl = "https://images.unsplash.com/photo-1643251935745-4209d215f221?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1200&q=40"

Try out the images above by replacing the imgUrl at line 29 in the widget below, and observe the results.

Press + to interact
// Endpoint URL
const endpointUrl = "https://api-inference.huggingface.co/models/facebook/detr-resnet-50";
const headerParameters = {
"Authorization": "Bearer {{ACCESS_TOKEN}}"
};
async function detectObjects(imgUrl) {
try {
// Reading image from URL
const imgResponse = await fetch(imgUrl);
const buffer = await imgResponse.buffer();
const options = {
method: "POST",
headers: headerParameters,
body: buffer
};
// Calling endpoint URL for object detection
const response = await fetch(endpointUrl, options);
printResponse(response);
} catch (error) {
printError(error);
}
}
// URL of the image
let imgUrl = "https://images.unsplash.com/photo-1599889959407-598566c6e1f1?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=500&q=40";
// Rendering html to show image
console.log(`<img src=${imgUrl} width="400px" height="500px">`);
detectObjects(imgUrl);