Case Study
Explore how to use Python's iterator pattern, comprehensions, and generator functions to partition datasets into training and testing subsets. Understand the mathematical logic behind these operations and apply them to the k-nearest neighbors algorithm for classification. This lesson helps you master efficient data handling and algorithm implementation using Python iterators.
We'll cover the following...
Overview
Python makes extensive use of iterators and iterable collections. This underlying aspect appears in many places. Each for statement makes implicit use of this. When we use functional programming techniques, such as generator expressions, and
the map(), filter(), and reduce() functions, we’re exploiting iterators.
Python has an itertools module full of additional iterator-based design patterns. This is worthy of study because it provides many examples of common operations that are readily available using built-in constructs.
We can apply these concepts in a number of places in our case study:
- Partitioning all the original samples into testing and training subsets.
- Testing a particular and distance hyperparameter set by classifying all the test cases.
- The -nearest neighbors (-NN) algorithm itself and how it locates the nearest neighbors from all the training samples.
The common aspect of these three processing examples is the for all aspect of each one. We’ll take a little side-trip into the math behind comprehensions and generator functions. The math isn’t terribly complex, but the following section can be thought of as deep background. After this digression, we’ll dive into partitioning data into training and testing subsets using the iterator concepts.
The set builder background
Formally, we can summarize operations like partitioning, testing, and even locating nearest neighbors with a logic expression. Some developers like the formality of it because it can help describe the processing without forcing a specific Python implementation.
Partitioning rules for partitioning
Here’s the essential rule for partitioning, for example. This involves a for all condition that describes the elements of a set of samples, ...