Exercise: Train and Evaluate ML Model

Test the knowledge you've gained so far by applying it to this hands-on exercise.

We'll cover the following

Now that we’ve reached the end of this chapter, its time for you to write some ML model code. We recommend following a more rigorous data science workflow this time, e.g., scikit-learn pipelines.

Problem statement

Write a script that would train and persist a model locally, as well as the train/test data used for it. Then upload your artifacts to Amazon S3 and write AWS Lambda code that will evaluate your model performance on train/test datasets. We have already split the dataset into train/test files. Please see a more detailed list of steps below:

  • Choose a real-life dataset (the wine quality dataset) and follow a typical data science routine:

    • Write data preprocessing functions.
    • Optimize the parameters, get the best model, and get train dataset evaluation metrics for your dataset. For that, we recommend using scikit-learn pipelines to combine data preprocessing and model training. The result should be such that your train.py script produces a model.joblib file that you will evaluate in your Lambda function. You can then upload the train/test files on an Amazon S3 bucket, from which they will be read by your function.
  • Write your own handler.py Lambda function. It should either be able to load your newly trained ML model from Amazon S3 or have the model already available in the container and calculate model performance metrics based on the train/test datasets imported from S3. You will need to consider how to pass information about the file’s location on Amazon S3 to your function.

Playground

Test yourself by performing the abovementioned tasks in the following widget:

Get hands-on with 1400+ tech skills courses.