Introduction to One-Stage Object Detection Architectures

Learn about the one-stage object detection architecture.

Detecting objects at one step

Extracting the regions from an image using the feature maps, then predicting the probability of these regions, whether they have an object inside or not, and finally sending the chosen high-probability regions to the classifier head seems to work quite accurately but slowly on the other hand.

Depending on the project we work on, we might need different expectations from our model regarding its speed. If we work on live videos, we will need a model able to process the frames at 30 FPS (the most common video setting, but it might be more or less than that) to catch the next frame coming from our live stream.

We might need more time to process a video that is not live but offline. Anyway, it wouldn’t be the best option to process a one-minute video in 10 minutes.

In many other cases—live or offline videos and single or batch images—we usually prefer the fastest model, preferably without trading off the accuracy.

At this point, one-stage object detectors are lifesavers, removing the multiple steps and generating the region proposals with their class scores simultaneously.

Get hands-on with 1400+ tech skills courses.