Other Important Hyperparameters in XGBoost
Learn about the other relevant hyperparameters in XGBoost.
We'll cover the following
XGBoost hyperparameters
We’ve seen that overfitting in XGBoost can be compensated for by using different learning rates, as well as early stopping. What are some of the other hyperparameters that may be relevant? XGBoost has many hyperparameters and we won’t list them all here. You’re encouraged to consult the XGBoost documentation for a full list.
In an upcoming exercise, we’ll do a grid search over ranges of six hyperparameters, including the learning rate. We will also include max_depth
, which should be familiar from the chapter “Decision Trees and Random Forests,” and controls the depth to which trees in the ensemble are grown. Aside from these, we will also consider the following:
-
gamma
limits the complexity of trees in the ensemble by only allowing a node to be split if the reduction in the loss function value is greater than a certain amount. -
min_child_weight
also controls the complexity of trees by only splitting nodes if they have at least a certain amount of “sample weight.” If all samples have equal weight (as they do for our exercise), this equates to the minimum number of training samples in a node. This is similar tomin_weight_fraction_leaf
andmin_samples_leaf
for decision trees in scikit-learn. -
colsample_bytree
is a randomly selected fraction of features that will be used to grow each tree in the ensemble. This is similar to themax_features
parameter in scikit-learn (which does the selection at a node level as opposed to the tree level here). XGBoost also makescolsample_bylevel
andcolsample_bynode
available to do the feature sampling at each level of each tree, and each node, respectively. -
subsample
controls what fraction of samples from the training data is randomly selected prior to growing a new tree for the ensemble. This is similar to thebootstrap
option for random forests in scikit-learn. Both this and thecolsample
parameters limit the information available during model training, increasing the bias of the individual ensemble members, but hopefully also reducing the variance of the overall ensemble and improving out-of-sample model performance.
As you can see, gradient boosted trees in XGBoost implement several concepts that are familiar from decision trees and random forests. Now, let’s explore how these hyperparameters affect model performance.
Get hands-on with 1400+ tech skills courses.