Managing the Deployments

Learn about fixing common deployment errors and monitoring the service.

Common deployment errors

We will take a look at some of the common deployment errors, the reasons they occur, and their resolutions.

Error Analysis

Error

Reason

Resolution

OutOfQuota

This failure happens if the VM server can't run the model. The primary step of the deployment is to ensure that we have a sufficient quota for the deployment. In some cases, the selected VM may not have the memory/CPU/Disk required for the job. Only high-end servers are supported for the deployment.

We review and request a quota for the high-end servers accordingly. It's a trade-off between price and functionality.

ImageBuildFailure

The error occurs if the image passed in the deployment throws some errors during image building on the VM servers.

We check the build errors in the build log azureml/ImageLogs/<image_id>/build.log to find the root cause of the exact issue. We try with a different build if the issue persists.

ResourceNotReady

The error happens if the deployment server is not ready for deployment. One of the several reasons is that the server might be still preparing the image or running another deployment.

If the issue persists, we check for any issues with the packages/code issues.

InternalServerError

The error occurs if there is an internal server issue on Azure.

We recommend raising a support ticket.

ResourceNotFound

The error is thrown if one of the resources referred to in the configuration is not available.

We check for the availability of all resources referred to in the YAML file (like storage server/dataset/compute).

BadArgument

  • Resource requests are out of limits.
  • There is an authorization error.
  • Unable to download code and model artifacts.
  1. We check for any limits on limits like instance_type.
  2. We should use the same authentication mechanism for endpoint/compute and deployment; e.g., we should use the same Azure principal.
  3. We try downloading from the Azure Machine Learning portal.

Troubleshooting online deployments

The deployment issues would mainly fall into conda environment issues or container issues.

Conda logs

For any conda installations as specified in the conda.yaml file, it's recommended to try the conda installation locally using the command conda env create -n userenv -f <conda.yml>.

If the local environment creation is successful and fails, the ...