Introduction to Azure Machine Learning
Learn about Azure machine learning capabilities and various phases of the ML lifecycle.
Azure Machine Learning is Azure’s offering for managing the ML project lifecycle. ML engineers, data scientists, and engineers can use the ML services and platform in their day-to-day workflows—from model development and deployment to machine learning operations (MLOps).
Target audience
-
Software developers: Azure Machine Learning helps organizations speed up model development and deployment. The developers or engineers can bring machine learning models into production in a secure and scalable production environment. It offers tools with varying degrees of complexity so that anyone from application developers to ML engineers can work in their own way.
-
Data scientists and ML engineers: They leverage tools like Azure SDK, Azure CLI, and Azure PowerShell to accelerate and automate their day-to-day jobs. They do, however, need support in migrating the existing machine learning code to Azure.
-
Application developers: They find tools such as Azure Cognitive Services and Automated ML (AutoML) for integrating models into applications or services. They want to use quick services to encapsulate the machine learning model development.
-
Platform developers: They get a robust set of tools for building advanced ML tools, such as using REST API interfaces for managing Azure resources and services.
Key differentiation of Azure Machine Learning
Collaboration: Azure offers multiple tools for collaboration. Developers can share code using shared Jupyter notebooks. They can share and use shared resources in a resource group or workspace. They can also deploy the code on GitHub.
Security: Built-in Azure cloud platform security features enable security for machine learning projects.
Security integrations include:
- Azure Virtual Networks (VNets): Along with network security groups, they are one of the fundamental building blocks for enabling communications within a private network or on the internet.
- Azure Key Vault: This is where we can save and access security secrets.
- Azure Container Registry: It enables us to build and manage container images and artifacts.
Azure Machine Learning is interoperable. Data scientists can reuse their existing ML code in Python frameworks, such as:
- PyTorch
- TensorFlow
- scikit-learn
- XGBoost
- LightGBM
Azure Machine Learning supports the following languages:
- R
- .NET
- Python
Machine learning development strategies
-
Azure Cognitive Services: Azure offers advanced AI services like translation, speech-to-text conversion, video analysis, and form-recognition. These are exposed as REST API services. Customers who want rapid application development can explore and leverage these services directly. Azure charges per transaction.
-
Azure Machine Learning studio: This interface is helpful for early developers. We can drag and drop using a designer interface. Everything is exposed as modules, from data preparation and model training to evaluation. We can also start with AutoML, which automatically selects the model algorithm and votes for the best model.
-
Azure CLI, SDK, Azure Manager, and REST API: This model is more helpful for advanced developers who can write code for model development in other open-source frameworks.
ML lifecycle
ML is not a sequential process. It’s a cyclical process that involves multiple iterations of experimentation and validation.
Extracting and preparing data
Extracting data is the preliminary step in the ML process. Azure Machine Learning offers various storage mechanisms for fetching and storing data.
Data exploratory analysis and data preparation are critical steps in the ML process. Azure Machine Learning creates good visualizations and generates statistical functions that ease data exploration. Also, Azure Machine Learning provides libraries and modules for data preprocessing and cleaning.
Training and validating models
Engineers and scientists can run the training script (in other frameworks) in the cloud. They can also build a model from scratch from the Azure Machine Learning studio. AutoML can also be used through the studio UI or Python SDK for automating experimentation and selecting the best model.
Azure Machine Learning can also help automate hyperparameter optimization with slight modifications to the job definition. Results are visualized in the Azure Machine Learning studio. This feature is not supported through CLI yet.
Deploying models
Azure Machine Learning supports managed endpoints. It abstracts the infrastructure needed for scoring the model. Azure Machine Learning supports Blue-Green deployments for preview and production deployments. It also supports batch and real-time scoring. In batch scoring, we pass the input data as a batch and get the score. In real-time scoring, we invoke an endpoint and receive a response in near real-time.
Monitoring model performance
It’s essential to audit and log the model lifecycle. Azure Machine Learning simplifies this process by using MLOps capabilities. Some key features enabling MLOps in Azure include:
-
Git integration: While submitting an ML job, repository information is tracked during the training process if the source code is in a local Git repository.
-
MLflow integration: MLflow is an open-source library for managing the ML lifecycle. Azure Machine Learning integrates with MLflow tracking to log the metrics and model artifacts.
-
Monitoring and Alerts: Azure Machine Learning includes features for monitoring the ML lifecycle process.
Evaluating model performance
Often, this involves validating the model performance in the actual application environment. The real input data could be different from the training data. Azure Machine Learning detects data drift and suggests retraining. Also, it generates performance metrics that can be analyzed for making the necessary changes to the model lifecycle.
Supported interfaces
-
Azure Machine Learning studio: This GUI interface is handy for exploration and is meant for machine learning beginners. We can play with the drag-and-drop modules to process the data, train the models, and even deploy the models.
-
Azure SDK: This is more useful for developers if they want to integrate ML into the existing infrastructure or framework. However, using this approach has the burden of manually setting up the environment, installing the packages, and so on. It’s more useful for advanced development scenarios.
-
Azure Manager and the REST API: This is helpful if developers want to manage the entire ML lifecycle using REST API services. The REST API service mode is more valuable when we want to run the training and running of the model online.
-
Azure CLI: This interface is developer-friendly and even helpful for intermediate developers. It’s more useful for offline model development with customization, automation, and scheduling capabilities.
We will use Azure CLI to complete the ML lifecycle in the coming sections.