Case Study
Learn how to optimize k-nearest neighbors classification using Strategy and Command patterns for varied distance computations and performance tuning.
Designing distance computations
We talked about the various ways to compute distances, but left part of the design to be filled in later. Now that we’ve seen some of the basic design patterns, we can apply some of them to our evolving case study.
Specifically, we need to put the various kinds of distance computations into the Hyperparameter
class definition. In previous lessons, we introduced the idea that the distance computation is not a single definition. There are over commonly used distance computation alternatives, some simple, some rather complex. In previous lessons, we showed a few common ones, including Euclidean distance, Manhattan distance, Chebyshev distance, and even a complex-looking Sorensen distance. Each weights the nearness of the neighbors slightly differently.
This leads us to look at the Hyperparameter
class as containing three important components:
- A reference to the base
TrainingData
. This is used to find all of the neighbors, from which the nearest are selected. - The value used to determine how many neighbors will be checked.
- The distance algorithm. We’d like to be able to plug in any algorithm here. Our research revealed a large number of competing choices. This suggests that implementing one or two won’t be very adaptable to real-world demands.
Plugging in the distance algorithm is a good application of the Strategy design pattern. For a given Hyperparameter
object, h
, the h.distance
object has
a distance()
method that does the work of computing a distance. We can plug in any of the subclasses of Distance
to do this work.
This means the Hyperparameter
class’ classify()
method will use the strategy’s self.distance.distance()
to compute the distances. We can use this to provide alternative distance objects as well as alternative values to find a combination that provides the best-quality classification of unknown samples.
We can summarize the relationships using a UML diagram like the following:
Get hands-on with 1400+ tech skills courses.