How to use a pre-trained deep learning model

In this article, we will use a pre-trained model for image classification. This strategy is just one of the three strategies mentioned in my previous articles to implement transfer learning techniques. If you haven’t checked them out yet, I would suggest reading them first to understand transfer learning. Go to these,

and understand the idea and strategies behind transfer learning.

Points to consider before moving on to the implementation

Some questions may arise while you’re implementing any of the transfer learning strategies:

  1. When do you fine-tune?
  2. How do you fine-tune?
  3. How do you decide what type of transfer learning you should perform on a new dataset?

The answer to the above questions depends mostly on two factors i.e., the size of the new dataset and its similarity to the original dataset on which your pre-trained model was trained.

I have listed four common rules that you should keep in mind while using transfer learning strategies. These are:

  • New dataset is large and very different from the original dataset - Since the dataset is large, we can go ahead and train the ConvNet from scratch, but it would take some time to train a complex network. So, in this case, it is common to use a fine-tuned approach. It may be the case that fine-tuning the ConvNet won’t produce good results. This is due to the fact that the new dataset is very different from the original one. In that case, you can train the ConvNet from scratch.

  • New dataset is large and similar to the original dataset - Again, the dataset is large and we can use the fine-tune approach to save the time taken in training the network from scratch. The model would give good results as the new dataset is very similar to the original one.

  • New dataset is small and similar to the original dataset - As the dataset is small, fine-tuning may lead to over-fitting of the model. So, it is advisable to use ConvNet as a fixed feature detector, add your own classifier (the dense layers) on top of it, and just train that classifier only.

  • New dataset is small but very different from the original dataset - Since the dataset is small, you can use the above-mentioned strategy. But if the dataset is very different from the original one, then it is advisable to fine-tune the ConvNet, but make sure not to go too deep in the network and try to adjust the weights of a small number of layers only.

There are few additional things to keep in mind while you perform transfer learning:

  • Learning rates: It is advisable to use small learning rates while fine-tuning the ConvNet model. This is because we expect that the model is well- trained and that it may not be good to distort the weights too fast and too much as that could result in a very poor model.

  • Constraints from pre-trained models: You should take the constraints into account before you actually work with a model. You should be familiar with the input format which the model has taken to be trained i.e., the format of the original dataset.

Now, we are good to dive into the coding part. In this article, we will be implementing the first strategy i.e., using a pre-trained model to work on an image classification problem.

We will use the ResNet50 model. The pre-trained model can classify images into 1000 object categories such as keyboard, mouse, pencil, animals, etc.

Note that here, 50 means that the network is 50 layers deep. You can have a 101 or 152-layer deep network also.

Import the required libraries

We will import the ResNet50 model from the Keras library. There are many other pre-trained models in the keras.applications module as well. Check them out here.

from keras.applications.resnet50 import ResNet50
from keras.applications.resnet50 import preprocess_input, decode_predictions
from keras.preprocessing import image
import numpy as np

Explanation:

  • The preprocess_input function is used to pre-process the input image to the format that the ResNet50 accepts and has been trained on.
  • The decode_predictions is used to convert the output values to one of the 1000 labels on which ResNet50 had been trained.

Load the model

The next step is to load the ResNet50 model. The code snippet below does exactly that.

model = ResNet50(weights='imagenet')

Explanation:

  • Here, we passed the value imagenet to the weights paramater to load the weights that were learned by the model while training on the ImageNet dataset. You can also pass None to load the model with a random initialization of weights.

Load the image

Now is the time to load our image. Here, I have taken an image of an elephant from the internet. You can also take one of your own.

img_path = '/elephant.jpg'
img = image.load_img(img_path,target_size=(224,224))
x = image.img_to_array(img)
x = np.expand_dims(x,axis=0)
x = preprocess_input(x)

Explanation:

  • load_img is used to load the image with the size your model expects. The ResNet50 model expects a (224,224) sized image.
  • Then, the image is converted to a numpy array.
  • An extra dimension is added to the image array.
  • Finally, the image array is passed to preprocess_input function so that the image becomes compatible with ResNet50 model.

Make the predictions

That’s all! Now, you can go ahead and get the labels that are predicted by the ResNet50 model.

pred = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
print(decode_predictions(pred,top=3)[0])

Once you run the above code, you will get an output similar to this.

[('n01871265', 'tusker', 0.5334062), ('n02504458', 'African_elephant', 0.28311166), ('n02504013', 'Indian_elephant', 0.18275209)]

Note that you may get a different output based on the image you have used to predict the labels.

The above output shows that the image can be of a Tusker (50% sure). Other than that, the model also predicted that it may be an African elephant or an Indian elephant.

Note that we only printed the top 3 classes. You can print any number of classes with a maximum value up to 1000.

If you take an image of let’s say, “a man in turban” and want to see what the label will be, you may get an output like this:

[('n04350905', 'suit', 0.3715849), ('n10148035', 'groom', 0.14189447), ('n04591157', 'Windsor_tie', 0.090490855)]

The output is not up to the mark as no label has the probability value higher than 50%. This happens due to the fact that the image we used may be very different from the original dataset on which ResNet50 model was trained.

So, to overcome these problems, the other two strategies are used to build a better model. We will look at those techniques in some other article as it beyond the scope of this article.