Case Study

Learn how to optimize k-nearest neighbors classification using Strategy and Command patterns for varied distance computations and performance tuning.

Designing distance computations

We talked about the various ways to compute distances, but left part of the design to be filled in later. Now that we’ve seen some of the basic design patterns, we can apply some of them to our evolving case study.

Specifically, we need to put the various kinds of distance computations into the Hyperparameter class definition. In previous lessons, we introduced the idea that the distance computation is not a single definition. There are over 5050 commonly used distance computation alternatives, some simple, some rather complex. In previous lessons, we showed a few common ones, including Euclidean distance, Manhattan distance, Chebyshev distance, and even a complex-looking Sorensen distance. Each weights the nearness of the neighbors slightly differently.

This leads us to look at the Hyperparameter class as containing three important components:

  • A reference to the base TrainingData. This is used to find all of the neighbors, from which the nearest are selected.
  • The kk value used to determine how many neighbors will be checked.
  • The distance algorithm. We’d like to be able to plug in any algorithm here. Our research revealed a large number of competing choices. This suggests that implementing one or two won’t be very adaptable to real-world demands.

Plugging in the distance algorithm is a good application of the Strategy design pattern. For a given Hyperparameter object, h, the h.distance object has a distance() method that does the work of computing a distance. We can plug in any of the subclasses of Distance to do this work.

This means the Hyperparameter class’ classify() method will use the strategy’s self.distance.distance() to compute the distances. We can use this to provide alternative distance objects as well as alternative kk values to find a combination that provides the best-quality classification of unknown samples.

We can summarize the relationships using a UML diagram like the following:

Get hands-on with 1400+ tech skills courses.