Understanding Anchor Boxes: Part II
Learn how anchor boxes are calculated.
We'll cover the following...
How do anchor boxes work?
Anchor boxes are predefined bounding boxes of various shapes and sizes that help detect objects with different aspect ratios by adjusting and refining their dimensions during training to match the ground truth boxes closely. Let’s learn how they work in a pipeline.
Calculating the size of an anchor box
Picking up anchors that represent our data is extremely important because YOLO learns to make adjustments to these anchor boxes to predict a bounding box for an object. Here are the steps we need to follow to calculate the anchor box size:
Get bounding boxes’ dimensions from the training data: Since we need to find out the height and width of the anchors, we first determine the height and width of all the bounding boxes in the training data.
Cluster the bounding boxes: YOLO employs a grid-based approach for object detection. To illustrate, in YOLOv3, an image of 416 × 416 dimensions is partitioned into three grids of sizes 13 × 13, 26 × 26, and 52 × 52.
Let’s consider that we have three anchor boxes for each grid cell. Given that YOLO makes predictions at three scales—small, medium, and large— this means that we have a total of nine anchor boxes (three boxes per scale).
Now, the question is how are these nine anchors assigned to the three grids? The assignment process depends on the size of the anchor boxes as follows:
The three largest anchor boxes are assigned to the grid with the largest cells. ...