Using PyTorch for Image Classification and Object Detection/

...

Fast R-CNN (2015)

Explore the concept of Fast R-CNN, its innovations enhancing the RCNN architecture, and the use of accuracy metrics for evaluating object detection models.

We'll cover the following...

Improvements
Training strategies
- Multitask loss
  - ROI (maximum) pooling
Comparison of R-CNN and Fast R-CNN

Fast RCNN is an improved version of RCNN architecture. It tries to address the weaknesses of RCNN architecture, which is comparatively slow and not as end-to-end trainable as the whole architecture. Let’s see how Fast R-CNN handles the shortcomings of the R-CNN architecture.

Improvements

R-CNN doesn’t apply the backbone directly to the image; it applies the backbone (AlexNet to extract the features) to 2000 extracted regions after the selective search. This makes it slow.

To resolve this problem, we change the order of selective search and its backbone. The feature extraction layer is applied directly to the input image, and then the selective search is applied to the output feature maps.

Secondly, the classifier branch in the head is replaced with fully connected layers using the softmax function. We don’t use SVM anymore, which gives us the ability to train the backbone together with the head. Even though the selective search is still there and the model is not end-to-end trainable, it’s more uniform than RCNN.

Press + to interact

Training strategies

We use similar approaches to the previous models:

SGD with momentum = 0.9
Initialize the weights in the head branches with Gaussian distribution mean = 0.01, standard deviation = 0.001
Transfer learning in the backbone
Apply L2 regularization.
Apply NMS after the classifier

Multitask loss

In image classification models, we have only one branch in the head to train the architecture. The R-CNN head has two branches, but only the regression branch contributes to the model training. Since Fast R-CNN aims to have a more end-to-end trainable network, the target is to train the whole model together, using both branches coming from the head. How would we be able to obtain a unique loss while we have two branches in the head, producing their own losses?

The model combines two loss functions (regression loss and classification loss) to obtain a common loss function, a process called multitask loss.

We have two branches with separate loss functions. Each loss function is compatible with their task, and we send an image through the networks to generate the results from both branches. After that, we sum the losses of both branches to update the weights in the head and the backbone.

Before We Start

Basics of Convolutional Neural Networks

Cats vs Dogs Classification with Convolutional Neural Networks

Popular Neural Network Architectures for Image Classification

Using PyTorch for Image Classification

Model Deployment

Using a PyTorch Model in JavaScript with ONNX

Basics of Object Detection

Two-Stage Object Detection Architectures

One-Stage Object Detection Architectures

YOLOv7 Model Train and Inference on Edge

Conclusion

Appendix

Building a System for Safety Helmet Detection Based on YOLOv5

Fast R-CNN (2015)

Improvements

Training strategies

Multitask loss