YOLOv4 (2020)
Learn about the detailed architecture and novelties of YOLOv4.
YOLOv4 has perhaps the most number of new touches in the architecture and methods used. After this version, we can accept that the gap between previous members or other single- and two-stage object detectors has widened.
The figure below shows us the general structure of YOLOv4 in a simple three-step:
We have many new structures here! The only part that we used from the previous version is the head of YOLOv3. It was the convolutional layers collecting the feature pyramidal maps to make final detections.
Let’s start discovering these new structures and their meaning to understand YOLOv4 architecture.
Backbone
To examine CSPDarknet53, we need to understand the following concepts: dense block, DenseNet, CSPDense block, CSPDenseNet, and CSPDarknet53.
Dense block
The main logic of a dense block is to connect every previous level feature map with the next ones, passing through some convolutional layers. Later, each dense block is connected with a translation layer, which consists of some regular operations inside. We can say that the main idea is similar to residual blocks, but it doesn’t concatenate only input and output levels like residuals. It also concatenates the intermediate layers.
DenseNet
DenseNet is an architecture created in 2016. It refers to densely connected convolutional networks, which consist of four dense blocks and a fully connected layer at the end.
We see an architecture with almost a side connection between each component. Dense blocks are formed with the layers already having shortcut connections at each level, and even the blocks have a similar relationship. Even though it’s a strong architecture designed firstly for image classification tasks, we should accept that the computational cost increases greatly.
CSPDense block
A CSPDense block addresses the ...