Prediction Using Pre-trained Resnet50
Explore the fundamental concepts of the Resnet50 model architecture.
This lesson will provide a step-by-step guide to building inference scripts using ResNet50. There will be multiple interactive playgrounds for you to practice with.
Overview of ResNet50
Residual Network, also known as ResNet, is one of the most groundbreaking computer vision research in recent years. Layers are stacked and trained based on the current tasks in deep convolutional neural networks.
Generally, the deeper the neural network architecture, the better the performance in terms of accuracy. However, this results in greater difficulty in training the neural network, and the performance might degrade under a certain threshold.
ResNet solves this by utilizing residual learning. Instead of learning features at the end of the layer, it tries to learn the residual, which is the subtraction of features learned from the layer’s input. The network is formed by stacking residual blocks
on top of each other. In doing so, it’s possible to train hundreds of layers with good performance and less complexity than other architectures.
Note:Residual blocks skip connections and do not learn functions without references. Instead, they learn functions that reference the layer’s inputs.
ResNet50 is just a variant of ResNet architecture. It contains the following layers:
- 48 Convolution layers
- 1 MaxPool
- 1 Average Pool layer
The table below highlights the differences between ResNet50 and other architectures:
Architecture | Parameters | Top-1 Accuracy | Year Published |
---|---|---|---|
AlexNet | 60M | 63.3 | 2012 |
VGG | 144M | 74.5 | 2014 |
Inception-V2 | 11.2M | 74.8 | 2015 |
ResNet50 | 25M | 75.3 | 2015 |
Note: The benchmark is based on the ImageNet datasets, which is the standard used by researchers for image classification. We can conclude that ResNet50 performs the best among state-of-the-art models.
PyTorch Image Model
The PyTorch Image Model provides its pre-trained weight for ResNet50. The pre-trained model was trained on millions of ImageNet’s images and can classify up to 1,000 different objects. We have fine-tuned the model with open-source datasets to categorize the following classes:
- cloudy
- rain
- shine
- sunrise
Import
The torchvision.models
module comes with the resnet50
class, which helps bypass instantiating the model via the timm.create_model
method. As a result, it reduces dependencies for our inference script. The timm.create_model
function provides more flexibility for custom models.
We can easily load the pre-trained model with the following code snippet:
Get hands-on with 1300+ tech skills courses.