Prerequisites for Distributed Deep Learning

Training and deploying deep learning models are expensive operations. They need a lot of computation capacity and time. Azure Machine Learning offers multiple features and resources for accelerated and efficient deep learning model training and deployment.

Creating a computing cluster

We need a high computing cluster for running deep learning jobs. Let’s create one and select more than one instance during compute creation. It’s advisable to keep the min_instances to 0 and the max_instances to the number of instances or the amount of parallelization we want. It’s a trade-off between computation and cost. By default, we will not have the required capacity to increase the number of instances. We need to raise a support ticket with Azure to get the number of instances allocated. For example, we need to request the DS2 series to get additional capacity. There are also a few GPUs available at an additional expense. If they are not available in your region, try looking for servers in other regions (like East US).

Get hands-on with 1400+ tech skills courses.