Endpoints and Deployment

Learn about Azure-supported endpoints and deployment.

How do we integrate our ML models into our product? Let’s talk about deploying the model. There are multiple approaches and multiple aspects of deployment. The first aspect is online vs. offline/batch deployment. Let’s compare both methods.

Online vs. Batch Deployment

Online deployment


Batch deployment


  • Used when we want the model to return the results immediately.
  • The model doesn’t need to be real-time. We invoke the model offline to generate the results.
  • An example of online deployment is search results and real-time prediction.
  • An example use case is processing and indexing documents. We can generate the recommendations offline and leverage the output in the online service.
  • In the online deployment mode, we need the service to be very quick and have less latency.
  • Batch deployment is slightly delayed. It can take hours or days to complete.
  • A lightweight database retrieval system and lightweight model are preferred in such cases.
  • This mode is used when we have to process a huge amount of data, requiring a lot of processing time.

Components in service deployment

What are the essential aspects needed for running a service? Let's understand by using the diagram below.

An ML model needs the following details for deployment:

  • An environment configuration: Specifications about which software and packages to install.

  • Scoring code: The code that scores the requested data using the model.

  • Inference configuration: Specifications about the script and other details to run the script.

...