Partition Function

Learn how to perform partitioning with the help of multiple implemented methods.

The partition() function

We can define multiple versions of the training() function to divide our data into an 80/20, 75/25, or 67/33 split:

Press + to interact
def training_80(s: KnownSample, i: int) -> bool:
return i % 5 != 0
def training_75(s: KnownSample, i: int) -> bool:
return i % 4 != 0
def training_67(s: KnownSample, i: int) -> bool:
return i % 3 != 0

Here’s a function, partition(), that takes one of the training_xx() functions as an argument. The training_xx() function is applied to a sample to decide if it’s training data or not:

Press + to interact
TrainingList = List[TrainingKnownSample]
TestingList = List[TestingKnownSample]
def partition(
samples: Iterable[KnownSample],
rule: Callable[[KnownSample, int], bool]
) -> Tuple[TrainingList, TestingList]:
training_samples = [
TrainingKnownSample(s)
for i, s in enumerate(samples) if rule(s, i)
]
test_samples = [
TestingKnownSample(s)
for i, s in enumerate(samples) if not rule(s, i)
]
return training_samples, test_samples

We’ve built a higher-order function that takes another function as an argument value. This is a very cool feature of functional programming that is an integral part of Python.

This partition() function builds two lists from a source of data and a function. This covers the simple case, where we don’t care about introducing values into the testing list that are duplicates of values in the training list.

While this is pleasantly succinct and expressive, it has a hidden cost. We’d like to avoid examining the data twice. For the small set of known samples in this particular problem, the processing is not particularly costly. But we may have a generator expression creating the raw data in the first place. Since we can only consume a ...