Search⌘ K

Using XGBoost with tidymodels

Explore how to implement the XGBoost algorithm using tidymodels in R. Learn to prepare data with one-hot encoding for categorical features, specify an XGBoost model, and train it with default hyperparameters. Understand key steps for transforming data and optimizing Gradient Boosting Trees for classification tasks.

Data preparation

The XGBoost algorithm only supports numeric data. For example, the R xgboost package doesn’t recognize R factors, including ordering factor levels. When using the recipes package for preparing data for use with xgboost, we have to follow these steps: One-hot encoding converts each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns.

  1. Prepare the training data according to best practices using dplyr (e.g, mutate() function) and recipes functions (e.g., step_num2factor()).

  2. Transform categorical predictive features into numeric representations using data preparation functions from the recipes package.

Note: This applies to the predictive features only.

When performing classification, ensure the label is a factor. Each label ...