...

/

Bounding Box Predictions

Bounding Box Predictions

Get a high-level overview of how YOLO makes predictions.

What is a bounding box?

A bounding box is simply a rectangle drawn around an object to identify the exact location of the object in an image. In OD tasks, it also helps us identify what kind of object is present in an image.

How are coordinates represented?

Mathematically, a bounding box is represented as a tensor consisting of information related to the location of the object and confidence scores. In OD tasks, two formats are widely followed to represent location:

  • (xminx_{min}, yminy_{min}, xmaxx_{max}, ymaxy_{max}): They are also known as top-left and bottom-right coordinates.

  • (xcenterx_{center}, ycentery_{center}, ww, hh): They are the center coordinates of an image, along with the width and height of the image.

x_min, y_min, x_max, y_max format representation
x_min, y_min, x_max, y_max format representation
x_center, y_center, w, h format representation
x_center, y_center, w, h format representation

Time to code!

In this example, we’re given an object’s bounding box annotation in Pascal VOC format. Our task is to convert these coordinates into the YOLO format. We will create a Python function that takes the image dimensions and bounding box coordinates in the PASCAL VOC format as the input and returns the bounding box coordinates in the YOLO format.

Input

  • Image dimensions (width and height)

  • Bounding box coordinates in the PASCAL VOC format (xminx_{min}, yminy_{min} ...