The Structure of YOLO (Backbone, Neck, and Head)
Learn the meaning of the terms commonly used in YOLO: backbone, neck, and head.
We'll cover the following...
The YOLO model consists of the following three main components:
Backbone: It extracts features from the input image.
Neck: It collects features that form the backbone for further transformation.
Head: It is responsible for final predictions.
Backbone
The term backbone in YOLO refers to a CNN that extracts features from the input image. These extracted features are subsequently utilized by later layers in the network for making predictions. Generally, a pretrained model such as a ResNet is used as a backbone. Here are some key features of the backbone network:
The architecture of the backbone network plays a critical role in an object detection model because it significantly influences the quality of the generated feature maps.
The features extracted by the backbone network are represented as feature maps. These are numerical matrices that encapsulate the patterns found in the image.
Interestingly, the backbone network extracts ...