What are ResNets?

Share

Overview

Deep neural networks are hard to train as their depth increases. This increase in depth comes with many problems. Residual Networks or ResNets are a solution to such problems without any greater cost overhead to the regular working of the neural networks. Before we dive into the ResNets, let's understand why we need them.

Degradation in deep neural networks

Neural networks have multiple neurons, and they are stacked in layers. These number of layers form the depth of the neural network. The number of neurons in each layer forms the width of a neural network.

It has been shown that increasing the width of the neural network makes it prone to memorizing the training data, leading to overfitting. But increasing the depth allows the neural networks to learn from the training data in a generalized way which is our goal.

After addressing problems like vanishing gradientsThe gradients of the activation functions like sigmoid become very small, which makes it difficult to train bigger models. Using activation functions like ReLU can solve this problem., there is still a problem of degradation. With the increase in the neural networks' depth, the neural network's accuracy starts degrading, which means higher training error. Moreover, experiments show that this is not because of overfitting.

Working of ResNets

ResNets can help us counter the degradation with the help of identity mapping. Identity mapping in a neural network does nothing but returns the same input that was given to the layer.

Note: If you are a little bit aware of electronics, you can imagine it to be a buffer amplifier.

To understand the usage of identity mapping, let's take an example of a neural network with 50 layers. We can introduce 20 new layers with identity mapping into this neural network. This will not degrade as these 20 layers are just forwarding the output with little tweaks to improve the performance.

A residual block

A ResNet consists of residual blocks. A residual block is like a shortcut block shown in the figure above. The working of this block can be represented as:

When learning identity mapping, the neural network needs to simply learn so as to make F(x)=0F(x) = 0. This can allow the neural network to just forward xx as it is.

Note: Usually, there is a combination of convolution, batch normalization, and activation layers in the place of one weighted layer.

Code implementation

Here, we'll implement a residual block in TensorFlow:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # to ignore tf warnings
import tensorflow as tf
from tensorflow.keras import layers
def residual_block(shape=(24, 24, 3)):
x = layers.Input(shape)
x_copy = x
# Layer 1
conv2d_layer1 = layers.Conv2D(3, (3, 3), padding='same')(x)
batch_norm_layer1 = layers.BatchNormalization()(conv2d_layer1)
activation_layer1 = layers.Activation('relu')(batch_norm_layer1)
# Layer 2
conv2d_layer2 = layers.Conv2D(3, (3, 3), padding='same')(activation_layer1)
batch_norm_layer2 = layers.BatchNormalization()(conv2d_layer2)
# skip connection
addition_layer = layers.Add()([x_copy, batch_norm_layer2])
activation_layer2 = layers.Activation('relu')(addition_layer)
model = tf.keras.Model(inputs=x, outputs=activation_layer2)
return model
residual_block().summary()

Code explanation

  • Line 2: We block TensorFlow warnings.
  • Line 7: We take the input of shape dimensions.
  • Line 11–13: We define the first layer with convolution, batch normalization, and activation layer.

Note: You can set the kernel size and filters according to your needs.

  • Line 15–16: We have convolution and batch normalization layer, but before we perform activation, we need to add the skip connection.
  • Line 19: We complete our skip connection by adding the original input to the output from the batch normalization layer.
  • Line 20: We perform the activation function using the activation layer, and that completes our residual block
  • Line 22–23: We create a model using just one residual block. This model is returned to the calling residual_block function.

Import ResNets

In this section, we'll learn to import the pretrained ResNet model—ResNet50. It is a relatively small model with 50 layers.

Using Tensorflow

We can use a number of models that are already available to us using TensorFlow. Here, we import ResNet50 using tf.keras.applications. It uses ImageNet weights by default. You can learn more here.

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # to ignore tf warnings
import tensorflow as tf
model = tf.keras.applications.resnet50.ResNet50()
model.summary()

Using PyTorch

Here, we use torchvision to get the models. To print the summary of a model in PyTorch, we have to install torchsummary.

from torchvision import models
from torchsummary import summary
model = models.resnet50()
print(model)
summary(model, (3, 512, 512))

We have to provide the input shape of ResNet50 explicitly so torchsummary could use it to print the summary. Here, the input shape for ResNet50 is (512, 512, 3).

Copyright ©2024 Educative, Inc. All rights reserved