Inference

Learn common techniques to scale inference in production environments.

We'll cover the following...

Inference

Inference is the process of using a trained machine learning model to make a prediction. Below are some of the techniques to scale inference in the production environment.

1. Imbalance workload

  • During inference, one common pattern is to split workloads onto multiple inference servers. We use similar architecture in Load Balancers. It is also sometimes called an Aggregator Service.
Dispatcher diagram
Dispatcher diagram
  1. Clients (upstream process) send requests to the ...