Deployment Strategies

Understand the deployment approaches for successful AI/ML systems.

Production setup

Once we’re happy with the models we’ve chosen, including the performance and error rate, we’ve got a good level of infrastructure to support our product and chosen AI model’s use case; we’re ready to go to the last step of the process and deploy this code into production. Keeping up with a deployment strategy that works for our product and organization will be part of the continuous maintenance. We’ll need to think about things such as how often we’ll need to retrain our models and refresh our training data to prevent model decay and data drift. We’ll also need a system for continuously monitoring our model’s performance, so this process will be really specific to our product and business, particularly because these periods of retraining will require some downtime for our system.

Deployment process

Deployment is going to be a dynamic process because our models are trying to effectively make predictions of real-world data for the most part, so depending on what’s going on in the world of our data, we might have to give deployment more or less of our attention. For instance, when we were working for an ML property-tech company, we were updating, retraining, and redeploying our models almost daily because we worked with real estate data that was experiencing a huge skew due to rapid changes in migration data and housing price data due to the pandemic.

If those models were left unchecked and there weren’t engineers and business leaders on both sides of this product, on the client’s end and internally, we might not have caught some of the egregious liberties the models were making on behalf of under-representative data.

Deployment strategies

There are also a number of well-known deployment strategies we should be aware of. We will discuss them in the following subsections:

Shadow

In this deployment strategy often referred to as shadow mode, we’re deploying a new model with new features along with a model that already exists so that the new model that’s deployed is only experienced as a shadow of the model that’s currently in production. This also means that the new model is handling all the requests it’s getting just as the existing model does, but it’s not showing the result of that model. This strategy allows us to see whether the shadow model is performing better on the same real-world data it’s getting without interrupting the model that’s actually live in production. Once it’s confirmed that the new model is performing better and that it has no issues running, it will then become the predominant model fully deployed in production, and the original model will be retired.

Get hands-on with 1200+ tech skills courses.