MyAwesomeDocker.tar.gz

MyAwesomeDocker

TorchJob

spa_run

spa_run_python

mypy_example

Ensuring the reliability and robustness of machine learning models is essential to building successful ML-powered applications.

This course begins with a thorough introduction to software testing essentials, particularly use cases within the machine learning context. You’ll learn about topics related to software testing, including unit and integration testing and more advanced testing techniques. Next, you’ll learn the best practices in software testing and dive into ML-specific testing techniques, such as behavioral and smoke tests. Lastly, you’ll cover the aspects of ML software reliability outside of testing, including runtime checks and type hinting.

By the end of this course, you'll be equipped with the knowledge and skills to ensure the reliability and robustness of your machine learning systems. You’ll be able to apply software engineering principles to your ML processes, create and execute efficient testing approaches, and utilize monitoring tools to identify and resolve problems in your ML systems.

Reliable Machine Learning

## Overview

In regular software engineering, we tend to monitor if the software is at least working —no errors, good response timing, etc.—which is usually enough. But what can go wrong with the machine learning code in runtime?

Regular software (say the CRM system) rarely breaks with no code changes or significant input data changes. ML software can be sensitive to minor distribution changes (seasonality, trends, new cameras, and microphones for visual/audio data).

Good monitoring comes with the following benefits:

* We're alerted when things break.
* We can learn what's broken and why.
* We can inspect trends over long time frames.
* We can compare system behavior across different versions and experimental groups (e.g., AB/testing).

### ML-specific monitoring

In ML engineering, we should also monitor the quality of our models and pipelines, and carefully look for things like concept and data drift. At the same time, regular software problems are still there and can't be ignored as well.

We'll cover three main aspects of machine learning monitoring in this lesson:

1. Service Health
2. Data Health
3. Model Health

  




# Overview

In regular software engineering, we tend to monitor if the software is at least working —no errors, good response timing, etc.—which is usually enough. But what can go wrong with the machine learning code in runtime?

Regular software (say the CRM system) rarely breaks with no code changes or significant input data changes. ML software can be sensitive to minor distribution changes (seasonality, trends, new cameras, and microphones for visual/audio data).

Good monitoring comes with the following benefits:

* We're alerted when things break.
* We can learn what's broken and why.
* We can inspect trends over long time frames.
* We can compare system behavior across different versions and experimental groups (e.g., AB/testing).

## ML-specific monitoring

In ML engineering, we should also monitor the quality of our models and pipelines, and carefully look for things like concept and data drift. At the same time, regular software problems are still there and can't be ignored as well.

We'll cover three main aspects of machine learning monitoring in this lesson:

1. Service Health
2. Data Health
3. Model Health

  




Learn about different aspects of monitoring in ML-specific services.

Introduction to Reliable ML

Software Testing

Best and Worst Practices

ML-Specific Tests

ML Software Reliability outside of Tests

Wrapping Up

Appendix

ML Monitoring Guide

Overview

ML-specific monitoring