Deep neural networks are hard to train as their depth increases. This increase in depth comes with many problems. Residual Networks or ResNets are a solution to such problems without any greater cost overhead to the regular working of the neural networks. Before we dive into the ResNets, let's understand why we need them.
Neural networks have multiple neurons, and they are stacked in layers. These number of layers form the depth of the neural network. The number of neurons in each layer forms the width of a neural network.
It has been shown that increasing the width of the neural network makes it prone to memorizing the training data, leading to overfitting. But increasing the depth allows the neural networks to learn from the training data in a generalized way which is our goal.
After addressing problems like
ResNets can help us counter the degradation with the help of identity mapping. Identity mapping in a neural network does nothing but returns the same input that was given to the layer.
Note: If you are a little bit aware of electronics, you can imagine it to be a buffer amplifier.
To understand the usage of identity mapping, let's take an example of a neural network with 50 layers. We can introduce 20 new layers with identity mapping into this neural network. This will not degrade as these 20 layers are just forwarding the output with little tweaks to improve the performance.
A ResNet consists of residual blocks. A residual block is like a shortcut block shown in the figure above. The working of this block can be represented as:
When learning identity mapping, the neural network needs to simply learn so as to make
Note: Usually, there is a combination of convolution, batch normalization, and activation layers in the place of one weighted layer.
Here, we'll implement a residual block in TensorFlow:
import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # to ignore tf warningsimport tensorflow as tffrom tensorflow.keras import layersdef residual_block(shape=(24, 24, 3)):x = layers.Input(shape)x_copy = x# Layer 1conv2d_layer1 = layers.Conv2D(3, (3, 3), padding='same')(x)batch_norm_layer1 = layers.BatchNormalization()(conv2d_layer1)activation_layer1 = layers.Activation('relu')(batch_norm_layer1)# Layer 2conv2d_layer2 = layers.Conv2D(3, (3, 3), padding='same')(activation_layer1)batch_norm_layer2 = layers.BatchNormalization()(conv2d_layer2)# skip connectionaddition_layer = layers.Add()([x_copy, batch_norm_layer2])activation_layer2 = layers.Activation('relu')(addition_layer)model = tf.keras.Model(inputs=x, outputs=activation_layer2)return modelresidual_block().summary()
shape
dimensions.Note: You can set the kernel size and filters according to your needs.
residual_block
function.In this section, we'll learn to import the pretrained ResNet model—ResNet50. It is a relatively small model with 50 layers.
We can use a number of models that are already available to us using TensorFlow. Here, we import ResNet50 using tf.keras.applications
. It uses ImageNet
weights by default. You can learn more here.
import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # to ignore tf warningsimport tensorflow as tfmodel = tf.keras.applications.resnet50.ResNet50()model.summary()
Here, we use torchvision
to get the models
. To print the summary of a model in PyTorch, we have to install torchsummary
.
from torchvision import modelsfrom torchsummary import summarymodel = models.resnet50()print(model)summary(model, (3, 512, 512))
We have to provide the input shape of ResNet50 explicitly so torchsummary
could use it to print the summary. Here, the input shape for ResNet50 is (512, 512, 3)
.