Other Important Hyperparameters in XGBoost

Learn about the other relevant hyperparameters in XGBoost.

We'll cover the following

XGBoost hyperparameters

We’ve seen that overfitting in XGBoost can be compensated for by using different learning rates, as well as early stopping. What are some of the other hyperparameters that may be relevant? XGBoost has many hyperparameters and we won’t list them all here. You’re encouraged to consult the XGBoost documentation for a full list.

In an upcoming exercise, we’ll do a grid search over ranges of six hyperparameters, including the learning rate. We will also include max_depth, which should be familiar from the chapter “Decision Trees and Random Forests,” and controls the depth to which trees in the ensemble are grown. Aside from these, we will also consider the following:

  • gamma limits the complexity of trees in the ensemble by only allowing a node to be split if the reduction in the loss function value is greater than a certain amount.

  • min_child_weight also controls the complexity of trees by only splitting nodes if they have at least a certain amount of “sample weight.” If all samples have equal weight (as they do for our exercise), this equates to the minimum number of training samples in a node. This is similar to min_weight_fraction_leaf and min_samples_leaf for decision trees in scikit-learn.

  • colsample_bytree is a randomly selected fraction of features that will be used to grow each tree in the ensemble. This is similar to the max_features parameter in scikit-learn (which does the selection at a node level as opposed to the tree level here). XGBoost also makes colsample_bylevel and colsample_bynode available to do the feature sampling at each level of each tree, and each node, respectively.

  • subsample controls what fraction of samples from the training data is randomly selected prior to growing a new tree for the ensemble. This is similar to the bootstrap option for random forests in scikit-learn. Both this and the colsample parameters limit the information available during model training, increasing the bias of the individual ensemble members, but hopefully also reducing the variance of the overall ensemble and improving out-of-sample model performance.

As you can see, gradient boosted trees in XGBoost implement several concepts that are familiar from decision trees and random forests. Now, let’s explore how these hyperparameters affect model performance.

Get hands-on with 1400+ tech skills courses.