Test Cases for Jupyter-based Projects

Overview

To assess a machine learning or deep learning project, we must employ standard evaluation metrics such as accuracy, precision, f-score, etc. In the following example, we discuss how to evaluate a project’s result when there is limited information available.

Testing methodology

Let’s take an example of the Face Detection Using Dlib and DNN in OpenCV project on Educative’s platform. In this project, we store the results in a variable, write our test cases, and test the model performance without calling any built-in functions. To do this, we follow the below-mentioned steps:

  1. Load the image with ground truths
  2. Generate and store model results
  3. Pass obtained results to custom test module
  4. Compare model predictions with ground truths using IOUIOU is an evaluation metric used to estimate an object detector’s accuracy on a particular dataset.
  5. Compute and display Precision, Recall and F1-Score

1. Load image and ground truths

We must have to place the required image with ground truths and our custom modules having .py extension into the /usercode directory using the file editor or Jupyter’s upload option (discussed in Jupyter-based Projects) shown below:

As annotated above, we are uploading following files into our directory:

  1. testResults.py: It is our customised testing module.
  2. image.jpg: This is the image under observation, and facial detection using dlib module is being performed on this image.
  3. ground-truths.txt: It is the file that contains the image’s ground truths.

The content of the relevant files and the image are given-below:

import pandas as pd
import numpy as np
def get_iou(first_rect, second_rect):
# determine the (x, y)-coordinates of the given rectangles
x_start = max(first_rect[0], second_rect[0])
y_start = max(first_rect[1], second_rect[1])
x_end = min(first_rect[2], second_rect[2])
y_end = min(first_rect[3], second_rect[3])
# computing the area of intersection
area_of_intersection = max(0, x_end - x_start + 1) * max(0, y_end - y_start + 1)
# computing the area of both the prediction and ground-truth rectangles
area_of_first = (first_rect[2] - first_rect[0] + 1) * (first_rect[3] - first_rect[1] + 1)
area_of_second = (second_rect[2] - second_rect[0] + 1) * (second_rect[3] - second_rect[1] + 1)
# returning the intersection over union value
# that is the intersection area dividing by:
# the sum of prediction + ground-truth areas - the interesection area
return area_of_intersection / float(area_of_first + area_of_second - area_of_intersection)
def test(model_results):
try:
gts = pd.read_csv('ground-truths.txt', names=['x1', 'y1', 'x2', 'y2'])
except:
print("Can't read/find ground-truths.txt")
ground_truths = []
for gt in gts.values:
ground_truths.append(list(gt))
actuals, predictions = np.ones((len(ground_truths)), dtype=int), np.zeros((len(ground_truths)), dtype=int)
valid_results = np.zeros((len(model_results)), dtype=int)
for i, result in enumerate(model_results):
for j, truth in enumerate(ground_truths):
if(get_iou(result, truth) > 0.7):
predictions[j] = 1
valid_results[i] = 1
# 1s of valid_results is true pos
tp = sum(valid_results == 1)
# 0s of valid_results is false pos
fp = sum(valid_results == 0)
# 0s of predictions is false neg
fn = sum(predictions == 0)
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * (precision * recall) / (precision + recall)
print(f'precision: {precision:.2f}', f'recall: {recall:.2f}', f'f1-score {f1:.2f}', sep=' | ')
print('-'*60)
print(f'false positive: {fp}', f'false negative: {fn}', f'true positive: {tp}', sep=' | ')
image.jpg

2. Store model results

Moving forward, we are generating results using a HOG-based model and storing them in a variable (i.e., dlib_results) as follows:

import cv2
import dlib

image = cv2.imread('image.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

detector = dlib.get_frontal_face_detector()
faces = detector(gray, 1)

dlib_results = []
for result in faces:
    x = result.left()
    y = result.top()
    x1 = result.right()
    y1 = result.bottom()
    dlib_results.append([x, y, x1, y1])
    cv2.rectangle(image, (x, y), (x1, y1), (0, 0, 255), 2)

3. Test results

Having those model results, we are passing them to our custom module testResults as follows:

import testResults
testResults.test(dlib_results)

4. Comparison with ground truths

Since we are calling the test() function from our module by providing it with model predictions. This function compares the model results (aka predictions) with the ground truths using the IOU threshold greater than 0.7. We aim to take the ground truth image and the bounding boxes marking the actual position of the faces in the image and then evaluate its performance using IOU. The higher score of IOU implies that the ground truth box and the predicted box are coincidental, as shown below:

5. Compute evaluation measures

Based upon the IOU threshold, we are calculating:

  • True-positives (means that both the predicted and ground truth boxes are coincidental)
  • False-positives (detected a bounding box but never there in ground truths)
  • False-negatives (not detected the bound box but there must be)

Lastly, we compute the precision, recall, and f1-score for model predictions and display the measures as shown in the following slides with all the preliminary steps.