H2O AutoML Is All You Need

Discover the power of H2O AutoML and learn how to train high-performing machine learning models quickly and easily.

What is H2O AutoML?

There is no one-size-fits-all algorithm that can be applied to every machine learning problem. The performance of a model is highly dependent on the specific characteristics of the dataset and the problem to be solved. This is why H2O AutoML was created.

H2O AutoML is an automated machine learning tool that can build and tune many models in parallel to achieve the best performance for a given predictive modeling problem. It can handle various machine learning tasks such as classification, regression, and time series forecasting.

H2O AutoML provides a high-level interface for automating the process of building and tuning a machine learning model, including data preprocessing, feature engineering, model selection, and hyperparameter tuning.

H2O AutoML parameters

Here are some of the important H2O AutoML parameters that we should get familiar with before training our model:

  • seed: Specifies the random number generator seed for reproducibility of results.

  • exclude_algos: Specifies the algorithms to be excluded from the AutoML model.

  • include_algos: Specifies the algorithms to be included in the AutoML model.

  • stopping_metric: Sets the stopping criteria for the AutoML model.

  • sort_metric: Specifies the metric by which the models will be sorted. This defaults to AUC for binary classification, mean_per_class_error for multinomial classification, and deviance for regression tasks.

  • nfolds: Specifies a value >= 2 for k-fold cross-validation of the models, or you can specify -1 to let AutoML choose it and specify 0 to disable cross-validation. This value defaults to -1.

  • validation_frame: Ignored unless nfolds is set to 0, in which case a validation frame can be specified and used to stop models early.

  • leaderboard_frame: Specifies a particular data frame to score and rank models on the leaderboard.

One of the following stopping strategies (time or number of model) must be specified while training the model. When both options are set, the H2O AutoML run will stop as soon as it hits one of these limits.

  • max_runtime_secs: Sets the maximum runtime for the AutoML model.

  • max_models: Specifies the maximum number of models that AutoML will generate.

Train H2O AutoML

By training a random grid of models, such as GBMs, deep neural networks, and GLMs, using a carefully chosen hyperparameter space, H2O AutoML can generate a diverse range of models. Each model is then individually tuned using cross-validation, ensuring that the model is robust and generalizable.

The results of these models are then returned as a sorted leaderboard, which allows the user to easily compare and choose the best-performing model for their specific task. Moreover, all models generated by AutoML can be easily exported to production environments, allowing for seamless integration into real-world applications.

This approach reduces the manual effort required for model selection and hyperparameter tuning, allowing data scientists to focus on more high-level tasks, such as feature engineering and data analysis.

With the Lending Club loans dataset at our disposal, we’ll build an H2O AutoML classification model that predicts the loan_status of applicants. This dataset contains information about loan applicants, such as their employment status, credit score, and loan amount, along with their loan status (e.g., fully paid, charged off).

Get hands-on with 1200+ tech skills courses.