Using PyTorch for Image Classification and Object Detection/

...

ResNet (2015)

Learn the fundamentals of the ResNet image classification architecture with vanishing and exploding gradient problems.

We'll cover the following...

General structure
Residual blocks
- Bottleneck block
Vanishing gradient
Why residual?
Comparison
Exploding gradient
Creating residual and bottleneck blocks

General structure

ResNet has different versions like ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152, where the general structure is the same, but the network is deeper. We need to understand that the number stands for the layer count, so ResNet-18 has 18 layers, ResNet-34 has 34 layers, etc. ResNet-34 is one of the winners of the competition, so we will review the structure based on this.
It’s training strategies are similar to other architectures. It has a learning rate initialized by 0.1, which is divided by 10 when the error stops improving, an SGD with a momentum of 0.9, L2 regularization with a coefficient of 0.0001, and a batch size of 256.
Batch normalization is used; dropout is not used.
The novelty of ResNet is that the architecture consists of the following.

Press + to interact

Vanishing gradient

It’s time to take a step further and learn about the possible problems that can have undesirable or unsuccessful results while training our model. Besides knowing how to create the network, it’s also essential to understand how to deal with our training issues.

Depending on our goal, shallow neural networks might not be enough to learn our complex tasks, so we need deeper ones. The deeper our neural network is, the harder it is to train it! One of the main reasons for this is the vanishing gradient problem.

We already know we need the gradients to update our weights while moving backward in our network. Starting from the top of the network, the more into the back layer we go, the smaller the gradient we obtain. It means that until we arrive at the initial layers, we lose the gradient, and since it’s the gradient itself that should update our weights, we don’t update our weights, so the model doesn’t learn. Vanishing gradients are also known as dead gradients.

The following neural networks show what happens when carrying the signal in the forward pass using sigmoid activation functions and calculating the derivative repeatedly in the backward pass.

Note: The cost function was kept simple to make the calculations more understandable. Also, the true answer is considered 1. We don’t have to pay too much attention to the calculations but must remark on the decreasing gradient. The more we move backward, the less our gradient is, and it’s only three effortless layers; imagine how big this vanishing gradient problem is for deep neural networks.

Press + to interact

Before We Start

Basics of Convolutional Neural Networks

Cats vs Dogs Classification with Convolutional Neural Networks

Popular Neural Network Architectures for Image Classification

Using PyTorch for Image Classification

Model Deployment

Using a PyTorch Model in JavaScript with ONNX

Basics of Object Detection

Two-Stage Object Detection Architectures

One-Stage Object Detection Architectures

YOLOv7 Model Train and Inference on Edge

Conclusion

Appendix

Building a System for Safety Helmet Detection Based on YOLOv5

ResNet (2015)

General structure

Residual blocks

Bottleneck block

Vanishing gradient