Deep learning is a specialized field within machine learning made up of neural networks with multiple layers. Its main focus is to be able to recognize patterns in various types of data and be able to make intelligent observations and predictions.
One interesting subfield of deep learning is computer vision, which enables machines to process and extract information from visual data depicted in images and videos. This concept allows us to perform emotion detection as well.
By an emotion detection application, we mean such an application that is able to intelligently recognize what primary emotion is on their face by observing their facial expressions. This is exactly what we'll be making in this answer, so stick with us till the end!
We will be setting up a Python code in which various libraries will be used, the most prominent ones being Keras and OpenCV.
Keras is a high-level API mainly utilized in the deep learning domain. The capabilities of the models it provides can be leveraged in various image detection tasks.
OpenCV is an open-source computer vision and machine learning library we can use for various image and video processing tasks.
For emotion detection, we would first need a model that is trained on the emotions we want to be able to classify and then use it to predict new images given to it. The overall process of how a model works is given below.
Step | Explanation |
Model architecture | Designing a CNN model for emotion detection |
Model compilation | Compiling the CNN model with an optimizer, loss function, and metric |
Model training | Training the model using training data using multiple epochs |
Model validation | Evaluating the trained model's performance on the test data |
Model saving | Saving the model and its weights |
Model inference | Using the saved model to make predictions on new images |
Before using our model to detect emotions, we will first train it on a dataset so that it can map facial features to the specific emotion labeled on it. The code for training is given below.
import osimport cv2import numpy as npfrom keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D, BatchNormalizationfrom keras.losses import categorical_crossentropyfrom keras.optimizers import Adamfrom keras.utils import np_utilsemotions = ['angry', 'disgusted', 'fearful', 'happy', 'neutral', 'sad', 'surprised']num_features = 64num_labels = len(emotions)batch_size = 64epochs = 50width, height = 48, 48def load_images_from_folder(folder_path):images = []labels = []for emotion_idx, emotion in enumerate(emotions):emotion_folder = os.path.join(folder_path, emotion)for filename in os.listdir(emotion_folder):img = cv2.imread(os.path.join(emotion_folder, filename), cv2.IMREAD_GRAYSCALE)if img is not None:img = cv2.resize(img, (width, height))images.append(img)labels.append(emotion_idx)return np.array(images), np.array(labels)train_images, train_labels = load_images_from_folder("train")test_images, test_labels = load_images_from_folder("test")train_images = train_images.reshape(train_images.shape[0], width, height, 1).astype('float32')test_images = test_images.reshape(test_images.shape[0], width, height, 1).astype('float32')train_images /= 255test_images /= 255train_labels = np_utils.to_categorical(train_labels, num_classes=num_labels)test_labels = np_utils.to_categorical(test_labels, num_classes=num_labels)model = Sequential()model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=(width, height, 1)))model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.5))model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Dropout(0.5))model.add(Flatten())model.add(Dense(256, activation='relu'))model.add(Dropout(0.3))model.add(Dense(128, activation='relu'))model.add(Dropout(0.3))model.add(Dense(num_labels, activation='softmax'))model.compile(loss=categorical_crossentropy,optimizer=Adam(),metrics=['accuracy'])model.fit(train_images, train_labels,batch_size=batch_size,epochs=epochs,verbose=1,validation_data = (test_images, test_labels),shuffle = True)model.save("your_model.h5")with open("your_model.json", "w") as json_file:json_file.write(model.to_json())
The necessary libraries are imported first so that the code can run smoothly.
os
for directory-related functions
keras
for model-related functions
cv2
for media-related functions
numpy
for numerical operations
Numerous emotions exist, but for now, we will be focusing on major emotions present in the dataset. The code defines a list of emotions that we will categorize our images into.
angry
disgusted
fearful
happy
neutral
sad
surprised
The load_images_from_folder
function loads images from the "train" and "test" folders, along with their corresponding labels. First, we read each image using cv2
. We then convert them to grayscale, resize them to 48 x 48 pixels, and store them in a NumPy array. We do this so that the data can be processed effectively.
The image data is then normalized by dividing pixel values by 255 using the operation /=255
to scale them between 0 and 1. We convert the labels to a categorical format using one-hot encoding by the np_utils.to_categorical()
function.
Since we're building a model from scratch, we will have to define its architecture too. A Sequential model is created, and CNN layers are added to it. The model consists of convolutional layers, Conv2D
, max-pooling layers, MaxPooling2D
, dropout layers, D
, for regularization, a flatten layer, Flatten
, for transitioning from a convolutional layer to a fully connected layer, and finally, fully connected layers, Dense
. The last layer has a softmax activation function so that our given media can be correctly assigned emotion probabilities.
Next, the model is compiled using the categorical cross-entropy loss function, Adam optimizer, and accuracy metric.
Lastly, we finally train our model using the training images and labels with a batch size of 64 and 50
After training, the trained model and its weights are saved as "model_saved.h5" and "model_saved.json" so that now we can use this model anytime without the training overhead.
Note: Training large datasets takes considerable time and, therefore, we have already trained the model so that we can directly use it in the next section. On your local machine you will have to train the model first using your preferred dataset and then you can use it whenever you want.
Our next step is to use our trained model to now predict emotions for a given input image.
import cv2import numpy as npfrom keras.models import model_from_jsonfrom keras.utils import img_to_arrayemotionList = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')emotion_model = model_from_json(open("your_model.json", "r").read())emotion_model.load_weights('your_model.h5')face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')given_image_path = "your_image.png"given_image = cv2.imread(given_image_path)gray_image = cv2.cvtColor(given_image, cv2.COLOR_BGR2GRAY)detections = face_haar_cascade.detectMultiScale(gray_image, 1.32, 5)for (x, y, w, h) in detections:cv2.rectangle(given_image, (x, y), (x+w, y+h), (0, 0, 0), thickness=7)face_region_gray = gray_image[y:y+w, x:x+h]resized_face = cv2.resize(face_region_gray, (48, 48))face_pixels = img_to_array(resized_face)face_pixels = np.expand_dims(face_pixels, axis=0)face_pixels = face_pixels / 255predictions = emotion_model.predict(face_pixels)max_index = np.argmax(predictions[0])prediction = emotionList[max_index]cv2.imshow('Emotion detection - Computer vision', given_image)text = f'Emotion: {prediction}'text_position = (10, given_image.shape[0] + 30)output_image = np.zeros((given_image.shape[0] + 50, given_image.shape[1], 3), np.uint8)output_image[:, :] = (0, 0, 0)output_image[0:given_image.shape[0], 0:given_image.shape[1]] = given_imagecv2.putText(output_image, text, text_position, cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)cv2.imshow('Emotion detection - Computer vision', output_image)cv2.waitKey(0)cv2.destroyAllWindows()
As always, we first import our needed libraries.
Next, we load our model. To achieve this, we retrieve the model architecture from a JSON file using model_from_json
. We also load the model's weights from an h5 file using the load_weights
method. With this, we have a fully trained model ready, and we don't even need to train it again!
Using the cv2.imread
method, we read the image from the given path and store it in given_image
. We also convert the input image to grayscale using cv2.cvtColor
as our emotion model works only with grayscale images.
We initialize the Haar Cascade Classifier using the file. It's a machine-learning object detection method used for detecting objects in images.
face_haar_cascade.detectMultiScale
is first used to detect faces in the grayscale image. For the coordinates of each detected face, we perform the following steps.
We draw a black rectangle around the face using cv2.rectangle
.
The face is extracted from the grayscale image and resized to 48x48 pixels to match the input size of our model.
The face pixels are converted to an array and normalized to values between 0 and 1.
The pre-trained emotion detection model predicts the emotion label for the face.
The predicted emotion label is retrieved using argmax
. This allows us to find the index of the highest probability emotion. This means how likely an image is to show a certain emotion.
To visually display the result, we create an output image and a text box for displaying the predicted emotion.
Finally, we show the output image with the predicted emotion in a window using cv2.imshow
. This way, we can visualize the emotion detection result for any given input image. The program will wait for a key press cv2.waitKey(0)
before it closes the window cv2.destroyAllWindows()
.
Yay, we've finally set up the complete working of both training our model and using it to predict new input.
We've set up a simple Flask server to render the results here for you, you can choose to run it on your local machine without integrating it with Flask as well. Click on "Run" to see a live demonstration!
from flask import Flask, render_template, Response import cv2 import numpy as np from keras.models import model_from_json from keras.utils import img_to_array app = Flask(__name__) emotionList = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral') emotion_model = model_from_json(open("model_saved.json", "r").read()) emotion_model.load_weights('model_saved.h5') face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') def detect_emotion_in_image(image_path): given_image = cv2.imread(image_path) gray_image = cv2.cvtColor(given_image, cv2.COLOR_BGR2GRAY) detections = face_haar_cascade.detectMultiScale(gray_image, 1.32, 5) for (x, y, w, h) in detections: cv2.rectangle(given_image, (x, y), (x+w, y+h), (0, 0, 0), thickness=7) face_region_gray = gray_image[y:y+w, x:x+h] resized_face = cv2.resize(face_region_gray, (48, 48)) face_pixels = img_to_array(resized_face) face_pixels = np.expand_dims(face_pixels, axis=0) face_pixels = face_pixels / 255 predictions = emotion_model.predict(face_pixels) max_index = np.argmax(predictions[0]) prediction = emotionList[max_index] text = f'Emotion: {prediction}' text_position = (10, given_image.shape[0] + 30) output_image = np.zeros((given_image.shape[0] + 50, given_image.shape[1], 3), np.uint8) output_image[:, :] = (0, 0, 0) output_image[0:given_image.shape[0], 0:given_image.shape[1]] = given_image cv2.putText(output_image, text, text_position, cv2.FONT_ITALIC, 1, (255, 255, 255), 2) ret, buffer = cv2.imencode('.jpg', output_image) frame = buffer.tobytes() yield (b'--frame\r\n' b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n') @app.route('/') def index(): return render_template('index.html') @app.route('/video_feed') def video_feed(): return Response(detect_emotion_in_image('testImg.png'), mimetype='multipart/x-mixed-replace; boundary=frame') if __name__ == '__main__': app.run(debug=True, host="0.0.0.0", port=5000)
Let's take a few random pictures and test our model on it.
Note: Each result might not be as expected and the accuracy of the model depends on quite a few factors such as the input image and the training dataset having similar formats, how extensive the dataset is, and overfitting or underfitting of the model.
Note: Here's the complete list of related projects in MediaPipe or deep learning.
Which method is used to define the architecture and layers of the CNN model for emotion detection?
model.fit(trainImg, trainLbl, batch_size=20, epochs=15, validation_data=(testImg, testLbl))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', input_shape=(width, height, 1)))
model.compile(loss=categorical_crossentropy, optimizer=Adam(), metrics=['accuracy'])
Free Resources