A Shuffling Strategy for Partitioning

Learn about the shuffling strategy for partitioning and how to implement it.

We'll cover the following

Let’s look at some subclasses that provide different partitioning strategies. We’ll start with one that shuffles and cuts, like a deck of cards.

Overview

One alternative is to shuffle and cut a list precisely the way a deck of cards is shuffled and cut before a game. We can use random.shuffle() to handle the randomized shuffling. The cut is – in a way – a hyperparameter. How large should the training set be compared to the testing set? Suggestions for knowledgeable data scientists include 80% to 20%, 67% to 30%, and an even 50% to 50% split. Because expert opinion varies, we need to provide a way for a scientist to adjust the partition ratio.

Implementation

We’ll make the split a feature of the class. We can create separate subclasses to implement alternative splits. Here’s a shuffling implementation:

Get hands-on with 1200+ tech skills courses.