Case Study

Learn how to optimize machine learning model training using concurrent.futures for parallel hyperparameter tuning and grid search evaluation strategy.

One of the problems that often plagues data scientists working on machine learning applications is the amount of time it takes to train a model. In our specific example of the kk-nearest neighbors implementation, training means performing the hyperparameter tuning to find an optimal value of kk and the right distance algorithm. In the previous chapters of our case study, we’ve tacitly assumed there will be an optimal set of hyperparameters. We’ll look at one way to locate the optimal parameters.

In more complex and less well-defined problems, the time spent training the model can be quite long. If the volume of data is immense, then very expensive compute and storage resources are required to build and train the model.

Hyperparameter tuning and compute-intensive tasks

In our case study, hyperparameter tuning is an example of a compute-intensive application. There’s very little I/O; if we use shared memory, there’s no I/O. This means that a process pool to allow parallel computation is essential. We can wrap the process pool in AsyncIO coroutines, but the extra async and await syntax seems unhelpful for this kind of compute-intensive example. Instead, we’ll use the concurrent.futures module to build our hyperparameter tuning function. The design pattern for concurrent.futures is to make use of a processing pool to farm out the various testing computations to a number of workers, and gather the results to determine which combination is optimal. A process pool means each worker can occupy a separate core, maximizing compute time. We’ll want to run as many tests of Hyperparameter instances at the same time as possible.

Components and summary of model

We’ll be using the TrainingKnownSample and the TestingKnownSample class definitions. We’ll need to keep these in a TrainingData instance. And, most importantly, we’ll need Hyperparameter instances.

We can summarize the model like this:

Get hands-on with 1400+ tech skills courses.