Prediction Using Pre-trained Resnet50

Explore the fundamental concepts of the Resnet50 model architecture.

This lesson will provide a step-by-step guide to building inference scripts using ResNet50. There will be multiple interactive playgrounds for you to practice with.

Overview of ResNet50

Residual Network, also known as ResNet, is one of the most groundbreaking computer vision research in recent years. Layers are stacked and trained based on the current tasks in deep convolutional neural networks.

Generally, the deeper the neural network architecture, the better the performance in terms of accuracy. However, this results in greater difficulty in training the neural network, and the performance might degrade under a certain threshold.

ResNet solves this by utilizing residual learning. Instead of learning features at the end of the layer, it tries to learn the residual, which is the subtraction of features learned from the layer’s input. The network is formed by stacking residual blocks on top of each other. In doing so, it’s possible to train hundreds of layers with good performance and less complexity than other architectures.

Note:Residual blocks skip connections and do not learn functions without references. Instead, they learn functions that reference the layer’s inputs.

ResNet50 is just a variant of ResNet architecture. It contains the following layers:

  • 48 Convolution layers
  • 1 MaxPool
  • 1 Average Pool layer

The table below highlights the differences between ResNet50 and other architectures:

Architecture Parameters Top-1 Accuracy Year Published
AlexNet 60M 63.3 2012
VGG 144M 74.5 2014
Inception-V2 11.2M 74.8 2015
ResNet50 25M 75.3 2015

Note: The benchmark is based on the ImageNet datasets, which is the standard used by researchers for image classification. We can conclude that ResNet50 performs the best among state-of-the-art models.

PyTorch Image Model

The PyTorch Image Model provides its pre-trained weight for ResNet50. The pre-trained model was trained on millions of ImageNet’s images and can classify up to 1,000 different objects. We have fine-tuned the model with open-source datasets to categorize the following classes:

  • cloudy
  • rain
  • shine
  • sunrise

Import

The torchvision.models module comes with the resnet50 class, which helps bypass instantiating the model via the timm.create_model method. As a result, it reduces dependencies for our inference script. The timm.create_model function provides more flexibility for custom models.

We can easily load the pre-trained model with the following code snippet:

Get hands-on with 1300+ tech skills courses.