Overview

To assess a machine learning or deep learning project, we must employ standard evaluation metrics such as accuracy, precision, f-score, etc. In the following example, we discuss how to evaluate a project’s result when there is limited information available.

Testing methodology

Let’s take an example of the Face Detection Using Dlib and DNN in OpenCV project on Educative’s platform. In this project, we store the results in a variable, write our test cases, and test the model performance without calling any built-in functions. To do this, we follow the below-mentioned steps:

Load the image with ground truths
Generate and store model results
Pass obtained results to custom test module
Compare model predictions with ground truths using IOUIOU is an evaluation metric used to estimate an object detector’s accuracy on a particular dataset.
Compute and display Precision, Recall and F1-Score

1. Load image and ground truths

We must have to place the required image with ground truths and our custom modules having .py extension into the /usercode directory using the file editor or Jupyter’s upload option (discussed in Jupyter-based Projects) shown below:

import pandas as pd
import numpy as np
def get_iou(first_rect, second_rect):
    # determine the (x, y)-coordinates of the given rectangles
    x_start = max(first_rect[0], second_rect[0])
    y_start = max(first_rect[1], second_rect[1])
    x_end = min(first_rect[2], second_rect[2])
    y_end = min(first_rect[3], second_rect[3])
    # computing the area of intersection
    area_of_intersection = max(0, x_end - x_start + 1) * max(0, y_end - y_start + 1)
    # computing the area of both the prediction and ground-truth rectangles
    area_of_first = (first_rect[2] - first_rect[0] + 1) * (first_rect[3] - first_rect[1] + 1)
    area_of_second = (second_rect[2] - second_rect[0] + 1) * (second_rect[3] - second_rect[1] + 1)
    # returning the intersection over union value 
    # that is the intersection area dividing by:
    # the sum of prediction + ground-truth areas - the interesection area
    return area_of_intersection / float(area_of_first + area_of_second - area_of_intersection)
def test(model_results):
    try:
        gts = pd.read_csv('ground-truths.txt', names=['x1', 'y1', 'x2', 'y2'])
    except:
        print("Can't read/find ground-truths.txt")
    ground_truths = []
    for gt in gts.values:
        ground_truths.append(list(gt))
    actuals, predictions = np.ones((len(ground_truths)), dtype=int), np.zeros((len(ground_truths)), dtype=int)
    valid_results = np.zeros((len(model_results)), dtype=int)
    for i, result in enumerate(model_results):
        for j, truth in enumerate(ground_truths):
            if(get_iou(result, truth) > 0.7):
                predictions[j] = 1
                valid_results[i] = 1
    # 1s of valid_results is true pos
    tp = sum(valid_results == 1)
    # 0s of valid_results is false pos
    fp = sum(valid_results == 0)
    # 0s of predictions is false neg
    fn = sum(predictions == 0)
        
    precision = tp / (tp + fp)
    recall = tp / (tp + fn)
    f1 = 2 * (precision * recall) / (precision + recall)
    print(f'precision: {precision:.2f}', f'recall: {recall:.2f}', f'f1-score {f1:.2f}', sep=' | ')
    print('-'*60)
    print(f'false positive: {fp}', f'false negative: {fn}', f'true positive: {tp}', sep=' | ')

2. Store model results

Moving forward, we are generating results using a HOG-based model and storing them in a variable (i.e., dlib_results) as follows:

import cv2
import dlib

image = cv2.imread('image.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

detector = dlib.get_frontal_face_detector()
faces = detector(gray, 1)

dlib_results = []
for result in faces:
    x = result.left()
    y = result.top()
    x1 = result.right()
    y1 = result.bottom()
    dlib_results.append([x, y, x1, y1])
    cv2.rectangle(image, (x, y), (x1, y1), (0, 0, 255), 2)

3. Test results

Having those model results, we are passing them to our custom module testResults as follows:

import testResults
testResults.test(dlib_results)

4. Comparison with ground truths

Since we are calling the test() function from our module by providing it with model predictions. This function compares the model results (aka predictions) with the ground truths using the IOU threshold greater than 0.7. We aim to take the ground truth image and the bounding boxes marking the actual position of the faces in the image and then evaluate its performance using IOU. The higher score of IOU implies that the ground truth box and the predicted box are coincidental, as shown below:

Motivation

Basics

Project Main Page

Task Management

Types of Projects

Test Cases for Jupyter-based Projects