Inference
Learn common techniques to scale inference in production environments.
Inference
Inference is the process of using a trained machine learning model to make a prediction. Below are some of the techniques to scale inference in the production environment.
1. Imbalance workload
- During inference, one common pattern is to split workloads onto multiple inference servers. We use similar architecture in Load Balancers. It is also sometimes called an Aggregator Service.
Get hands-on with 1400+ tech skills courses.