Emotion detection in deep learning

Deep learning is a specialized field within machine learning made up of neural networks with multiple layers. Its main focus is to be able to recognize patterns in various types of data and be able to make intelligent observations and predictions.

One interesting subfield of deep learning is computer vision, which enables machines to process and extract information from visual data depicted in images and videos. This concept allows us to perform emotion detection as well.

import os
import cv2
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D, BatchNormalization
from keras.losses import categorical_crossentropy
from keras.optimizers import Adam
from keras.utils import np_utils
emotions = ['angry', 'disgusted', 'fearful', 'happy', 'neutral', 'sad', 'surprised']
num_features = 64
num_labels = len(emotions)
batch_size = 64
epochs = 50
width, height = 48, 48
def load_images_from_folder(folder_path):
    images = []
    labels = []
    for emotion_idx, emotion in enumerate(emotions):
        emotion_folder = os.path.join(folder_path, emotion)
        for filename in os.listdir(emotion_folder):
            img = cv2.imread(os.path.join(emotion_folder, filename), cv2.IMREAD_GRAYSCALE)
            if img is not None:
                img = cv2.resize(img, (width, height))
                images.append(img)
                labels.append(emotion_idx)
    return np.array(images), np.array(labels)
train_images, train_labels = load_images_from_folder("train")
test_images, test_labels = load_images_from_folder("test")
train_images = train_images.reshape(train_images.shape[0], width, height, 1).astype('float32')
test_images = test_images.reshape(test_images.shape[0], width, height, 1).astype('float32')
train_images /= 255
test_images /= 255
train_labels = np_utils.to_categorical(train_labels, num_classes=num_labels)
test_labels = np_utils.to_categorical(test_labels, num_classes=num_labels)
model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=(width, height, 1)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(num_labels, activation='softmax'))
model.compile(loss=categorical_crossentropy,
              optimizer=Adam(),
              metrics=['accuracy'])
model.fit(train_images, train_labels,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data = (test_images, test_labels),
          shuffle = True)
model.save("your_model.h5")
with open("your_model.json", "w") as json_file:
    json_file.write(model.to_json())

Importing libraries

The necessary libraries are imported first so that the code can run smoothly.

os for directory-related functions
keras for model-related functions
cv2 for media-related functions
numpy for numerical operations

Emotions list

Numerous emotions exist, but for now, we will be focusing on major emotions present in the dataset. The code defines a list of emotions that we will categorize our images into.

angry
disgusted
fearful
happy
neutral
sad
surprised

Loading data

The load_images_from_folder function loads images from the "train" and "test" folders, along with their corresponding labels. First, we read each image using cv2. We then convert them to grayscale, resize them to 48 x 48 pixels, and store them in a NumPy array. We do this so that the data can be processed effectively.

Data preprocessing

The image data is then normalized by dividing pixel values by 255 using the operation /=255 to scale them between 0 and 1. We convert the labels to a categorical format using one-hot encoding by the np_utils.to_categorical() function.

Model architecture

Since we're building a model from scratch, we will have to define its architecture too. A Sequential model is created, and CNN layers are added to it. The model consists of convolutional layers, Conv2D, max-pooling layers, MaxPooling2D, dropout layers, D, for regularization, a flatten layer, Flatten, for transitioning from a convolutional layer to a fully connected layer, and finally, fully connected layers, Dense. The last layer has a softmax activation function so that our given media can be correctly assigned emotion probabilities.

Model compilation

Next, the model is compiled using the categorical cross-entropy loss function, Adam optimizer, and accuracy metric.

Model training

Lastly, we finally train our model using the training images and labels with a batch size of 64 and 50 epochsiterations of the dataset. During training, the model's performance is evaluated on the test data to see how accurate it is.

Model saving

After training, the trained model and its weights are saved as "model_saved.h5" and "model_saved.json" so that now we can use this model anytime without the training overhead.

Note: Training large datasets takes considerable time and, therefore, we have already trained the model so that we can directly use it in the next section. On your local machine you will have to train the model first using your preferred dataset and then you can use it whenever you want.

Model prediction code

Our next step is to use our trained model to now predict emotions for a given input image.

import cv2
import numpy as np
from keras.models import model_from_json
from keras.utils import img_to_array
emotionList = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
emotion_model = model_from_json(open("your_model.json", "r").read())
emotion_model.load_weights('your_model.h5')
face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
given_image_path = "your_image.png"
given_image = cv2.imread(given_image_path)
gray_image = cv2.cvtColor(given_image, cv2.COLOR_BGR2GRAY)
detections = face_haar_cascade.detectMultiScale(gray_image, 1.32, 5)
for (x, y, w, h) in detections:
    cv2.rectangle(given_image, (x, y), (x+w, y+h), (0, 0, 0), thickness=7)
    face_region_gray = gray_image[y:y+w, x:x+h]
    resized_face = cv2.resize(face_region_gray, (48, 48))
    face_pixels = img_to_array(resized_face)
    face_pixels = np.expand_dims(face_pixels, axis=0)
    face_pixels = face_pixels / 255
    predictions = emotion_model.predict(face_pixels)
    max_index = np.argmax(predictions[0])
    prediction = emotionList[max_index]
cv2.imshow('Emotion detection - Computer vision', given_image)
text = f'Emotion: {prediction}'
text_position = (10, given_image.shape[0] + 30)
output_image = np.zeros((given_image.shape[0] + 50, given_image.shape[1], 3), np.uint8)
output_image[:, :] = (0, 0, 0)
output_image[0:given_image.shape[0], 0:given_image.shape[1]] = given_image
cv2.putText(output_image, text, text_position, cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
cv2.imshow('Emotion detection - Computer vision', output_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Importing libraries

As always, we first import our needed libraries.

Loading the pre-trained model

Next, we load our model. To achieve this, we retrieve the model architecture from a JSON file using model_from_json. We also load the model's weights from an h5 file using the load_weights method. With this, we have a fully trained model ready, and we don't even need to train it again!

Loading our input image

Using the cv2.imread method, we read the image from the given path and store it in given_image. We also convert the input image to grayscale using cv2.cvtColor as our emotion model works only with grayscale images.

Initializing the Haar Cascade classifier

We initialize the Haar Cascade Classifier using the file. It's a machine-learning object detection method used for detecting objects in images.

Preprocessing the input and determining emotions

face_haar_cascade.detectMultiScale is first used to detect faces in the grayscale image. For the coordinates of each detected face, we perform the following steps.

We draw a black rectangle around the face using cv2.rectangle.
The face is extracted from the grayscale image and resized to 48x48 pixels to match the input size of our model.
The face pixels are converted to an array and normalized to values between 0 and 1.
The pre-trained emotion detection model predicts the emotion label for the face.
The predicted emotion label is retrieved using argmax. This allows us to find the index of the highest probability emotion. This means how likely an image is to show a certain emotion.

Creating the result

To visually display the result, we create an output image and a text box for displaying the predicted emotion.

Displaying the output

Finally, we show the output image with the predicted emotion in a window using cv2.imshow. This way, we can visualize the emotion detection result for any given input image. The program will wait for a key press cv2.waitKey(0) before it closes the window cv2.destroyAllWindows().

Execution with Flask

Yay, we've finally set up the complete working of both training our model and using it to predict new input.

We've set up a simple Flask server to render the results here for you, you can choose to run it on your local machine without integrating it with Flask as well. Click on "Run" to see a live demonstration!

from flask import Flask, render_template, Response
import cv2
import numpy as np
from keras.models import model_from_json
from keras.utils import img_to_array

app = Flask(__name__)

emotionList = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
emotion_model = model_from_json(open("model_saved.json", "r").read())
emotion_model.load_weights('model_saved.h5')

face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

def detect_emotion_in_image(image_path):
    given_image = cv2.imread(image_path)
    gray_image = cv2.cvtColor(given_image, cv2.COLOR_BGR2GRAY)

    detections = face_haar_cascade.detectMultiScale(gray_image, 1.32, 5)

    for (x, y, w, h) in detections:
        cv2.rectangle(given_image, (x, y), (x+w, y+h), (0, 0, 0), thickness=7)

        face_region_gray = gray_image[y:y+w, x:x+h]
        resized_face = cv2.resize(face_region_gray, (48, 48))
        face_pixels = img_to_array(resized_face)
        face_pixels = np.expand_dims(face_pixels, axis=0)
        face_pixels = face_pixels / 255

        predictions = emotion_model.predict(face_pixels)

        max_index = np.argmax(predictions[0])
        prediction = emotionList[max_index]

        text = f'Emotion: {prediction}'
        text_position = (10, given_image.shape[0] + 30)

        output_image = np.zeros((given_image.shape[0] + 50, given_image.shape[1], 3), np.uint8)
        output_image[:, :] = (0, 0, 0)
        output_image[0:given_image.shape[0], 0:given_image.shape[1]] = given_image
        cv2.putText(output_image, text, text_position, cv2.FONT_ITALIC, 1, (255, 255, 255), 2)

        ret, buffer = cv2.imencode('.jpg', output_image)
        frame = buffer.tobytes()
        yield (b'--frame\r\n'
               b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n')

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/video_feed')
def video_feed():
    return Response(detect_emotion_in_image('testImg.png'), mimetype='multipart/x-mixed-replace; boundary=frame')

if __name__ == '__main__':
    app.run(debug=True, host="0.0.0.0", port=5000)

Step	Explanation
Model architecture	Designing a CNN model for emotion detection
Model compilation	Compiling the CNN model with an optimizer, loss function, and metric
Model training	Training the model using training data using multiple epochs
Model validation	Evaluating the trained model's performance on the test data
Model saving	Saving the model and its weights
Model inference	Using the saved model to make predictions on new images

Emotion detection in deep learning