Who Should Take This Course?

Get a brief overview of the course and its intended audience.

Intended audience

The goal of this course is to make machine learning systems more robust and to make you more confident in the results of your work. This course is curated for anyone with experience in machine learning and software engineering who wants to make reliable machine learning systems.

What Is Defensive Programming?

Writing reliable software is also called defensive programming.

The idea is based on defensive driving. In defensive driving, you adopt the mindset that you’re never sure what the other drivers will do. That way, you ensure you won’t be hurt if they do something dangerous. You take responsibility for protecting yourself even when it might be the other driver’s fault. In defensive programming, the main idea is that if a routine is passed bad data, it won’t be hurt, even if the bad data is another routine’s fault.

Steve McConnell, Code Complete

We will extend this awesome metaphor by Steve McConnell to the machine learning domain. We, as ML practitioners, want to make the systems we build robust to various potential sources of danger (e.g., human errors, unexpected usage scenarios, and even changes related to external circumstances).

What is this course about?

We’ll cover the following broad topics:

  • Non-ML-specific software tests: These are tests commonly used to validate the software’s functionality and ensure it works as intended. These tests can include unit tests, integration tests, mocking, and smoke tests. They are not specific to machine learning systems and can be applied to any software.

  • ML-specific tests: These are tests specifically designed to validate the accuracy and performance of machine learning models. These tests include behavioral tests, regression tests, and data validation.

  • Best practices for testing: We’ll discuss when to add tests, which tests we should add, and how we can measure if the number of tests is sufficient. We’ll discuss how to handle flaky tests, which are tests that produce inconsistent results. We’ll also cover how to automate the testing process using CI/CD.

  • Runtime checks: These are tests performed while a program is running to verify its correctness. These checks include assertions, boundary, type, and performance checks. Runtime checks help developers identify and fix their programs’ issues before they cause production problems.

  • Debugging ML pipelines: Debugging is identifying and fixing errors in machine learning systems. This can include identifying the root cause of errors, implementing fixes, and verifying that the fixes have resolved the issue.

  • Monitoring for ML pipelines: The monitoring process involves continuously tracking the health of the data, service, and model. It includes issues such as model drift, the change in the model’s performance over time, and data quality control, ensuring that the data being fed into the pipeline is high quality and free of errors.

We’re not going to cover things related to the pure modeling side of machine learning, including model evaluation (the process of assessing the performance of a machine learning model on a given dataset), metrics (the specific measurements that can be used to evaluate the performance of a model while solving concrete problems), or data/model version control (the process of keeping track of changes made to data and models between experiments). Even though these topics are essential, they are not the focus of this course.