Overview

When the code of our machine learning service is written and put into production, it should run as smoothly as a living organism. We don’t always have the option to intervene and fix something on the go. There aren’t many things to do to make this code reliable, but we’ll discuss most of them in this section.

Fail fast vs. fail safe

Depending on the domain and application, we should either process exceptions/bugs silently (log or trigger an alert and continue running) or completely stop runtime. This decision is also called the robustness vs. correctness issue.

Get hands-on with 1300+ tech skills courses.