YOLOX (2021) and YOLOv6 (2022)
Learn about the new architecture of the YOLO family: YOLOX and YOLOv6.
In this chapter, we’ll discuss the YOLOX and YOLOv6 architectures.
YOLOX architecture
The YOLOX architecture takes one step back. Instead of following the YOLOv4 path, it continues from YOLOv3 as the base structure and improves it differently than YOLOv4.
YOLOX has three major novelties: decoupled head, anchor-free, and advanced label-assigning strategy.
Decoupled head
We know that an object detection problem consists of two tasks: classification and localization, in other words, classification and regression problems. Until now, we saw that both tasks are solved in a single head, whether the model is two-stage or one-stage. YOLOX adds a new approach to the YOLO family: decoupling the head and solving each task on its own.
The above figure shows us the difference between coupled and decoupled heads where the coupled head takes the last feature map coming from the backbone and applies convolution to produce class scores, objectness scores, and localization results in one head proportional to the number of anchor boxes used for the model. Each channel in the head’s convolutional layer represents the weights to solve one of these tasks. In the decoupled head, the last feature map follows two parallel paths: one for the classification task and one to be decoupled one more time to solve objectness score and localization problems individually.
<b>Note</b>: YOLOX is an anchor-free model, so we will see how it works without using anchor boxes. The decoupled heads of YOLOx don’t contain as many channel dimensions as anchor boxes but rather only one per task.
Anchor-free
YOLOX doesn’t use anchor boxes and breaks the rule after YOLOv2, v3, and v4. Remembering that YOLO—the very first member of the family—didn’t use any anchor boxes, we should know that YOLOX still follows the similar architecture of YOLOv3, i.e., not having fully connected layers at the end like YOLO. To not break the structure that much from YOLOv3, we adapt the anchor-free mechanism in a very simple way: pretend like we have one anchor box instead of three for each level of the feature map since, in the end, it’s as same as writing ...