Prerequisites for Distributed Deep Learning
Learn about the prerequisites for running distributed models in Azure.
Training and deploying deep learning models are expensive operations. They need a lot of computation capacity and time. Azure Machine Learning offers multiple features and resources for accelerated and efficient deep learning model training and deployment.
Creating a computing cluster
We need a high computing cluster for running deep learning jobs. Let’s create one and select more than one instance during compute creation. It’s advisable to keep the min_instances
to 0
and the max_instances
to the number of instances or the amount of parallelization we want. It’s a trade-off between computation and cost. By default, we will not have the required capacity to increase the number of instances. We need to raise a support ticket with Azure to get the number of instances allocated. For example, we need to request the DS2
series to get additional capacity. There are also a few GPUs available at an additional expense. If they are not available in your region, try looking for servers in other regions (like East US).
Get hands-on with 1400+ tech skills courses.