Managing the Deployments
Learn about fixing common deployment errors and monitoring the service.
Common deployment errors
We will take a look at some of the common deployment errors, the reasons they occur, and their resolutions.
Error Analysis
Error | Reason | Resolution |
OutOfQuota | This failure happens if the VM server can't run the model. The primary step of the deployment is to ensure that we have a sufficient quota for the deployment. In some cases, the selected VM may not have the memory/CPU/Disk required for the job. Only high-end servers are supported for the deployment. | We review and request a quota for the high-end servers accordingly. It's a trade-off between price and functionality. |
ImageBuildFailure | The error occurs if the image passed in the deployment throws some errors during image building on the VM servers. | We check the build errors in the build log |
ResourceNotReady | The error happens if the deployment server is not ready for deployment. One of the several reasons is that the server might be still preparing the image or running another deployment. | If the issue persists, we check for any issues with the packages/code issues. |
InternalServerError | The error occurs if there is an internal server issue on Azure. | We recommend raising a support ticket. |
ResourceNotFound | The error is thrown if one of the resources referred to in the configuration is not available. | We check for the availability of all resources referred to in the YAML file (like storage server/dataset/compute). |
BadArgument |
|
|
Troubleshooting online deployments
The deployment issues would mainly fall into conda environment issues or container issues.
Conda logs
For any conda installations as specified in the conda.yaml
file, it's recommended to try the conda installation locally using the command conda env create -n userenv -f <conda.yml>
.
If the local environment creation is successful and fails, the ...