Fast R-CNN (2015)

Explore the concept of Fast R-CNN, its innovations enhancing the RCNN architecture, and the use of accuracy metrics for evaluating object detection models.

Fast RCNN is an improved version of RCNN architecture. It tries to address the weaknesses of RCNN architecture, which is comparatively slow and not as end-to-end trainable as the whole architecture. Let’s see how Fast R-CNN handles the shortcomings of the R-CNN architecture.

Improvements

R-CNN doesn’t apply the backbone directly to the image; it applies the backbone (AlexNet to extract the features) to 2000 extracted regions after the selective search. This makes it slow.

To resolve this problem, we change the order of selective search and its backbone. The feature extraction layer is applied directly to the input image, and then the selective search is applied to the output feature maps.

Secondly, the classifier branch in the head is replaced with fully connected layers using the softmax function. We don’t use SVM anymore, which gives us the ability to train the backbone together with the head. Even though the selective search is still there and the model is not end-to-end trainable, it’s more uniform than RCNN.

Get hands-on with 1200+ tech skills courses.