A Shuffling Strategy for Partitioning
Learn about the shuffling strategy for partitioning and how to implement it.
We'll cover the following
Let’s look at some subclasses that provide different partitioning strategies. We’ll start with one that shuffles and cuts, like a deck of cards.
Overview
One alternative is to shuffle and cut a list precisely the way a deck of cards
is shuffled and cut before a game. We can use random.shuffle()
to handle the randomized shuffling. The cut is – in a way – a hyperparameter. How large should the training set be compared to the testing set? Suggestions for knowledgeable data scientists include 80% to 20%, 67% to 30%, and an even 50% to 50% split. Because expert opinion varies, we need to provide a way for a scientist to adjust the partition ratio.
Implementation
We’ll make the split a feature of the class. We can create separate subclasses to implement alternative splits. Here’s a shuffling implementation:
Get hands-on with 1400+ tech skills courses.