Computer vision is a specialized field within the realm of artificial intelligence that enables machines to process and extract information from visual data depicted in images and videos. Therefore, image classification is an application of this field.
Image classification is a technique to help group images and label them according to the pixels or objects detected within the image. It is a branch of computer vision and uses predefined classes to categorize images.
In simple words, we assign a label to a previously
Are you interested in the concepts of computer vision but still confused regarding the implementation and real-life applications? This Answer is going to highlight all such concerns in detail with a highly interesting scenario.
Suppose we have been given an unlabelled image, and we aim to create such a model that correctly classifies the image using the nearest possible class from the list of classes it has been taught already.
We can easily accomplish the task using Keras! So, let's get straight towards it!
To accomplish image classification through Python coding, we can employ a powerful library named Keras. Keras is a high-level API that is mainly utilized in the deep learning domain. The capabilities of the models it provides can be leveraged in solving image classification tasks.
The goal of a classification code is to mainly be able to fulfill the below mentioned steps.
Defining the test and validation datasets
Defining the model with parameters such as convolutional and pooling layers.
Training the model
Fitting the model
Using the model for predictions
import tensorflow as tffrom tensorflow.keras.preprocessing import imagefrom tensorflow.keras.preprocessing.image import ImageDataGeneratorimport numpy as npimport matplotlib.pyplot as pltimport base64
We import the required modules for our code, including:
tensorflow
and keras
for model-oriented and image-processing tasks
numpy
for numerical operations
matplotlib
for visual representations
base64
for encoding images
imageSize = (250, 250)batchSize = 20trainDirectory = 'archive/seg_train/seg_train'testDirectory = 'archive/seg_test/seg_test'
We specify the image and batch size to be used in the training process and save the paths to our training and testing data.
Note: It's preferred to use compressed and resized images if the model has to be trained using a lot of data.
generateTrainingData = ImageDataGenerator(rescale=1./255,rotation_range=25,width_shift_range=0.1,height_shift_range=0.1,shear_range=0.1,zoom_range=0.1,horizontal_flip=True,fill_mode='nearest')
Since we're using a limited amount of images to train our data, it's a good practice to generate augmented data. Augmented data is artificially generated using the original data by performing different operations like rotations or flips etc.
For this purpose, we set augmentation options like rescale
, rotation_change
, width_shift_range
, height_shift_range
, shear_range
, zoom_range
, horizontal_flip
, and fill_mode
.
trainDataset = generateTrainingData.flow_from_directory(trainDirectory,seed=594,target_size=imageSize,batch_size=batchSize,class_mode='sparse')validationDataset = tf.keras.utils.image_dataset_from_directory(testDirectory,seed=594,image_size=imageSize,batch_size=batchSize)
Our images for training are read from the directory, augmentation is applied, and the images are resized. We then test our model's accuracy using the validation set.
classNames = list(trainDataset.class_indices.keys())classCount = len(classNames)
As our classNames
and their count, classCount
, will be used in the calculations ahead, we'll define them first.
model = tf.keras.Sequential([tf.keras.layers.Conv2D(20, 3, activation='relu', input_shape=(imageSize[0], imageSize[1], 3)),tf.keras.layers.MaxPooling2D(),tf.keras.layers.Conv2D(40, 3, activation='relu'),tf.keras.layers.MaxPooling2D(),tf.keras.layers.Conv2D(80, 3, activation='relu'),tf.keras.layers.MaxPooling2D(),tf.keras.layers.Flatten(),tf.keras.layers.Dense(80, activation='relu'),tf.keras.layers.Dense(classCount)])
This is one of the most crucial steps in our process. We define the architecture of our convolutional neural network model, which consists of multiple convolutional layers Conv2D
, pooling layers MaxPooling2D
, flatten layer Flatten
, and dense layers Dense.
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['accuracy'])history = model.fit(trainDataset,validation_data=validationDataset,epochs=15)
Next, we compile the model by specifying the optimizer
, loss
function, and evaluation metrics
. The code is then trained and validated using the trainingDataset
and validationDataset
we defined initially. It runs for a specified number of
img = image.load_img('../test.png', target_size=imageSize)imgArray = image.img_to_array(img)imgArray = np.expand_dims(imgArray, axis=0)imgArray = imgArray / 255.0
This code loads any image passed to it, converts it to an array, adds another dimension to match the model shape using np.expand_dims
and scales the pixel values between 0 and 1 by dividing imgArray
by 255.0.
predictions = model.predict(imgArray)predictedClassIndex = np.argmax(predictions)predictedClass = classNames[predictedClassIndex]
Now is the time for prediction! Our model obtains the predicted class probabilities, determines the class index, predictedClassIndex
, with the highest probability and retrieves the corresponding class label in predictedClass
.
That's how we build an image classification model from scratch and use it to predict unseen images. Let's see it in action now.
Note: Since the training and validation data contained just a few pictures, the model will run fast but might not be too accurate. For better accuracy and complexity more images and categories can be added.
Yay, you made it till here! The complete code is given below and can be experimented with by changing the code and pressing "Run".
Our code is trained with limited images for seas and buildings. Therefore, we'll be providing it with an unseen image from one of the two categories to see how well it predicts that image.
Note: Our images have been taken from the "Intel Image Classification" dataset.
We save the prediction in a output.png
, which is then rendered on output.html
and displayed to us.
We will be using the image 19763.jpg
as a parameter for our prediction and see what class it is assigned. This image is of a building originally.
import tensorflow as tf from tensorflow.keras.preprocessing import image from tensorflow.keras.preprocessing.image import ImageDataGenerator import numpy as np import matplotlib.pyplot as plt import base64 imageSize = (250, 250) batchSize = 20 trainDirectory = 'archive/seg_train/seg_train' testDirectory = 'archive/seg_test/seg_test' generateTrainingData = ImageDataGenerator( rescale=1./255, rotation_range=25, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, fill_mode='nearest' ) trainDataset = generateTrainingData.flow_from_directory( trainDirectory, seed=594, target_size=imageSize, batch_size=batchSize, class_mode='sparse' ) validationDataset = tf.keras.utils.image_dataset_from_directory( testDirectory, seed=594, image_size=imageSize, batch_size=batchSize ) classNames = list(trainDataset.class_indices.keys()) classCount = len(classNames) model = tf.keras.Sequential([ tf.keras.layers.Conv2D(20, 3, activation='relu', input_shape=(imageSize[0], imageSize[1], 3)), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Conv2D(40, 3, activation='relu'), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Conv2D(80, 3, activation='relu'), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Flatten(), tf.keras.layers.Dense(80, activation='relu'), tf.keras.layers.Dense(classCount) ]) model.compile( optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'] ) history = model.fit( trainDataset, validation_data=validationDataset, epochs=15 ) img = image.load_img('19763.jpg', target_size=imageSize) imgArray = image.img_to_array(img) imgArray = np.expand_dims(imgArray, axis=0) imgArray = imgArray / 255.0 predictions = model.predict(imgArray) predictedClassIndex = np.argmax(predictions) predictedClass = classNames[predictedClassIndex] plt.imshow(imgArray[0]) plt.title(predictedClass) plt.savefig('output.png') html = f''' <html> <body> <h1>Predicted Class: {predictedClass}</h1> <img src="data:image/png;base64,{base64.b64encode(open('output.png', 'rb').read()).decode('utf-8')}" alt="Output"> </body> </html> ''' with open('output.html', 'w') as file: file.write(html)
Our trained model uses its data to predict which class resembles the most using various techniques. Since this is a building image, and it closely resembles to the features of our building training data, the model predicts "buildings".
Now, we will be using the image test.png
as a parameter for our prediction and see what class it is assigned. This image is of the sea originally.
import tensorflow as tf from tensorflow.keras.preprocessing import image from tensorflow.keras.preprocessing.image import ImageDataGenerator import numpy as np import matplotlib.pyplot as plt import base64 imageSize = (250, 250) batchSize = 20 trainDirectory = 'archive/seg_train/seg_train' testDirectory = 'archive/seg_test/seg_test' generateTrainingData = ImageDataGenerator( rescale=1./255, rotation_range=25, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, fill_mode='nearest' ) trainDataset = generateTrainingData.flow_from_directory( trainDirectory, seed=594, target_size=imageSize, batch_size=batchSize, class_mode='sparse' ) validationDataset = tf.keras.utils.image_dataset_from_directory( testDirectory, seed=594, image_size=imageSize, batch_size=batchSize ) classNames = list(trainDataset.class_indices.keys()) classCount = len(classNames) model = tf.keras.Sequential([ tf.keras.layers.Conv2D(20, 3, activation='relu', input_shape=(imageSize[0], imageSize[1], 3)), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Conv2D(40, 3, activation='relu'), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Conv2D(80, 3, activation='relu'), tf.keras.layers.MaxPooling2D(), tf.keras.layers.Flatten(), tf.keras.layers.Dense(80, activation='relu'), tf.keras.layers.Dense(classCount) ]) model.compile( optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'] ) history = model.fit( trainDataset, validation_data=validationDataset, epochs=15 ) img = image.load_img('../test.png', target_size=imageSize) imgArray = image.img_to_array(img) imgArray = np.expand_dims(imgArray, axis=0) imgArray = imgArray / 255.0 predictions = model.predict(imgArray) predictedClassIndex = np.argmax(predictions) predictedClass = classNames[predictedClassIndex] plt.imshow(imgArray[0]) plt.title(predictedClass) plt.savefig('output.png') html = f''' <html> <body> <h1>Predicted Class: {predictedClass}</h1> <img src="data:image/png;base64,{base64.b64encode(open('output.png', 'rb').read()).decode('utf-8')}" alt="Output"> </body> </html> ''' with open('output.html', 'w') as file: file.write(html)
As this is a sea image, and the sea image resembles closely to the features of our training data, it predicts "sea".
Training data
A filter passed to the model
Convolutional layer
Teaches the model different scenarios and their outputs