Test Cases for Jupyter-based Projects
Learn to add test cases in Jupyter-based projects.
Overview
To assess a machine learning or deep learning project, we must employ standard evaluation metrics such as accuracy, precision, f-score, etc. In the following example, we discuss how to evaluate a project’s result when there is limited information available.
Testing methodology
Let’s take an example of the Face Detection Using Dlib and DNN in OpenCV project on Educative’s platform. In this project, we store the results in a variable, write our test cases, and test the model performance without calling any built-in functions. To do this, we follow the below-mentioned steps:
- Load the image with ground truths
- Generate and store model results
- Pass obtained results to custom test module
- Compare model predictions with ground truths using
IOU IOU is an evaluation metric used to estimate an object detector’s accuracy on a particular dataset. - Compute and display Precision, Recall and F1-Score
1. Load image and ground truths
We must have to place the required image with ground truths and our custom modules having .py
extension into the /usercode
directory using the file editor or Jupyter’s upload option (discussed in Jupyter-based Projects) shown below:
As annotated above, we are uploading following files into our directory:
- testResults.py: It is our customised testing module.
- image.jpg: This is the image under observation, and facial detection using
dlib
module is being performed on this image. - ground-truths.txt: It is the file that contains the image’s ground truths.
The content of the relevant files and the image are given-below:
import pandas as pdimport numpy as npdef get_iou(first_rect, second_rect):# determine the (x, y)-coordinates of the given rectanglesx_start = max(first_rect[0], second_rect[0])y_start = max(first_rect[1], second_rect[1])x_end = min(first_rect[2], second_rect[2])y_end = min(first_rect[3], second_rect[3])# computing the area of intersectionarea_of_intersection = max(0, x_end - x_start + 1) * max(0, y_end - y_start + 1)# computing the area of both the prediction and ground-truth rectanglesarea_of_first = (first_rect[2] - first_rect[0] + 1) * (first_rect[3] - first_rect[1] + 1)area_of_second = (second_rect[2] - second_rect[0] + 1) * (second_rect[3] - second_rect[1] + 1)# returning the intersection over union value# that is the intersection area dividing by:# the sum of prediction + ground-truth areas - the interesection areareturn area_of_intersection / float(area_of_first + area_of_second - area_of_intersection)def test(model_results):try:gts = pd.read_csv('ground-truths.txt', names=['x1', 'y1', 'x2', 'y2'])except:print("Can't read/find ground-truths.txt")ground_truths = []for gt in gts.values:ground_truths.append(list(gt))actuals, predictions = np.ones((len(ground_truths)), dtype=int), np.zeros((len(ground_truths)), dtype=int)valid_results = np.zeros((len(model_results)), dtype=int)for i, result in enumerate(model_results):for j, truth in enumerate(ground_truths):if(get_iou(result, truth) > 0.7):predictions[j] = 1valid_results[i] = 1# 1s of valid_results is true postp = sum(valid_results == 1)# 0s of valid_results is false posfp = sum(valid_results == 0)# 0s of predictions is false negfn = sum(predictions == 0)precision = tp / (tp + fp)recall = tp / (tp + fn)f1 = 2 * (precision * recall) / (precision + recall)print(f'precision: {precision:.2f}', f'recall: {recall:.2f}', f'f1-score {f1:.2f}', sep=' | ')print('-'*60)print(f'false positive: {fp}', f'false negative: {fn}', f'true positive: {tp}', sep=' | ')
2. Store model results
Moving forward, we are generating results using a HOG-based model and storing them in a variable (i.e., dlib_results
) as follows:
import cv2
import dlib
image = cv2.imread('image.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
detector = dlib.get_frontal_face_detector()
faces = detector(gray, 1)
dlib_results = []
for result in faces:
x = result.left()
y = result.top()
x1 = result.right()
y1 = result.bottom()
dlib_results.append([x, y, x1, y1])
cv2.rectangle(image, (x, y), (x1, y1), (0, 0, 255), 2)
3. Test results
Having those model results, we are passing them to our custom module testResults
as follows:
import testResults
testResults.test(dlib_results)
4. Comparison with ground truths
Since we are calling the test()
function from our module by providing it with model predictions. This function compares the model results (aka predictions) with the ground truths using the IOU threshold greater than 0.7. We aim to take the ground truth image and the bounding boxes marking the actual position of the faces in the image and then evaluate its performance using IOU. The higher score of IOU implies that the ground truth box and the predicted box are coincidental, as shown below:
5. Compute evaluation measures
Based upon the IOU threshold, we are calculating:
- True-positives (means that both the predicted and ground truth boxes are coincidental)
- False-positives (detected a bounding box but never there in ground truths)
- False-negatives (not detected the bound box but there must be)
Lastly, we compute the precision, recall, and f1-score for model predictions and display the measures as shown in the following slides with all the preliminary steps.