Building Robust Object-Oriented Python Applications and Libraries/

...

Case Study

Learn how to compute different distance computation algorithms using sympy, create unit tests, and analyze coverage for thorough validation.

We'll cover the following...

Unit testing the distance classes
The Euclidean distance computation
The Chebyshev distance computation
Considerations for test case selection

We’ll return to some material from an earlier chapter and apply some careful testing to be sure we’ve got a good, workable implementation. We’ve looked at the distance computations that are part of the $k$ -nearest neighbors classifier and several computations that produced slightly different results:

Euclidean distance: This is the direct line from one sample to another.
Manhattan distance: This follows streets-and-avenues around a grid (like the city of Manhattan), adding up the steps required along a series of straight-line paths.
Chebyshev distance: This is the largest of the streets-and-avenues distances.
Sorensen distance: This is a variation of the Manhattan distance that weights nearby steps more heavily than distant steps. It tends to magnify small distances, making more subtle discriminations.

These algorithms all produce distinct results from the same inputs; they all involve complex-looking math, and they all need to be tested in isolation to ensure we have implemented them correctly. We’ll start with unit tests of the distances.

Unit testing the distance classes

We need to create some test cases for each distance computation algorithm. When we look at the various equations, we can see that there are four pairs of relevant values from two samples: the sepal length and width, and the petal length and width. To be extremely thorough, we could create at least $16$ distinct cases for each algorithm:

Case 0: All four values are the same; the distance should be zero.
Cases 1–4: One of the four values is different between the two samples. For example, a test sample might have measurements of ("sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2), where as a training sample might have measurements of ("sepal_length": 5.2, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2); only one of these values is distinct.
Cases 5–10: A pair of values is different.
Cases 11–14: Three values are different between the two samples.
Case 15: All four values are different.

In addition, the concepts of equivalence partitioning and boundary value analysis suggest that we also need to locate values where ...

Object-Oriented Design

Objects in Python

When Objects Are Alike

Expecting the Unexpected

When to Use Object-Oriented Programming

Abstract Base Classes and Operator Overloading

Python Data Structures

Object-Oriented and Functional Programming Intersection

Strings, Serialization, and File Paths

The Iterator Pattern

Common Design Patterns

Advanced Design Patterns

Testing Object-Oriented Programs

Concurrency

Conclusion

Build a Python Airline Reservation System

Case Study

Unit testing the distance classes