Finalizing and Serializing the Model

Learn about model training, evaluating, saving, and loading, and predicting new data.

We'll cover the following

Let's say that we have found an accurate machine learning model. We have finally trained our model on the complete dataset and are ready for delivery. This is amazing, but it's not the end of our project. We need to save our trained model.

Overview

After all our efforts (whole data science and machine learning pipeline, including cross-validation to test the model's skills), we finally train the model on complete data and make it practically available for work (deployment). Here, we'll consider that the model trained on (X_train, y_train) is the final for the learning purpose.

A useful library called pickle provides a standard way of serializing objects in Python. We can use the pickle operation to serialize our trained model/algorithms and save this serialized format to a file with any name. Using the pickle library, we can load the saved model file at any time and deserialize it to make new predictions for unseen data. At this stage, it is also good to know that we usually schedule retraining and updating the serialized model files when a sufficient amount of new data is available.

Steps

So, let's move on and do the following steps:

  1. Save the model with a name.

  2. Load the saved model.

  3. Get predictions for X_test using the saved model after loading.

  4. Cross-check if we get the same test data results as our previous results.

Get hands-on with 1300+ tech skills courses.