Model Scaling for Concatenation-Based Models
Learn why compound model scaling was introduced in YOLOv7 and how it helped with better performance.
What is model scaling?
Model scaling refers to the process of adjusting certain attributes or parameters of the model to create different versions of it that can cater to varying requirements, such as computational resources or inference speeds.
In practical applications, different object detection tasks have diverse requirements that depend heavily on the specific scenario and resources available. Let’s consider a few examples:
Real-time object detection: This involves applications that require instant results, such as autonomous vehicles, surveillance systems, or AR/VR applications. These applications often run on edge devices with limited computational power, making the efficiency of the model a top priority. Therefore, these scenarios call for lighter, highly optimized models that may compromise a bit on accuracy but offer faster inference speeds to provide real-time responses.
Offline object detection: Some applications don’t require instant results and can afford to process data offline. These include tasks like analyzing satellite images or processing large volumes of stored video data. In these cases, systems often have access to abundant computational resources, which allows the use of larger, more complex models. The emphasis here is on achieving high accuracy, even if it takes more time for inference.
Therefore, object detection models are released at different scales based on the accuracy and inference time requirements. The model architecture can be scaled in the following dimensions:
Resolution: The size of the input image
Scale: The number of layers
Width: The number of channels ...