Another Way of Growing Trees: XGBoost's grow_policy
Learn about XGBoost's lossguide grow policy and how to set the tree_method hyperparameter.
We'll cover the following
Controlling tree growth in XGBoost
In addition to limiting the maximum depth of trees using a max_depth
hyperparameter, there is another paradigm for controlling tree growth: finding the node where a split would result in the greatest reduction in the loss function, and splitting this node, regardless of how deep it will make the tree. This may result in a tree with one or two very deep branches, while the other branches may not have grown very far. XGBoost offers a hyperparameter called grow_policy
, and setting this to lossguide
results in this kind of tree growth, while the depthwise
option is the default and grows trees to an indicated max_depth
, as we’ve done in the chapter “Decision Trees and Random Forests,” and so far in this chapter. The lossguide
grow policy is a newer option in XGBoost and mimics the behavior of LightGBM, another popular gradient boosting package.
To use the lossguide
policy, it is necessary to set another hyperparameter we haven’t discussed yet, tree_method
, which must be set to hist
or gpu-hist
. Without going into too much detail, the hist
method will use a faster way of searching for splits. Instead of looking between every sequential pair of sorted feature values for the training samples in a node, the hist
method builds a histogram, and only considers splits on the edges of the histogram. So, for example, if there are 100 samples in a node, their feature values may be binned into 10 groups, meaning there are only 9 possible splits to consider instead of 99.
Using the lossguide
grow policy
We can instantiate an XGBoost model for the lossguide
grow policy as follows, using a learning rate of 0.1
based on intuition from our hyperparameter exploration in the previous exercise:
xgb_model_3 = xgb.XGBClassifier(
n_estimators=1000,\
max_depth=0,\
learning_rate=0.1,\
verbosity=1,\
objective='binary:logistic',\
use_label_encoder=False,\
n_jobs=-1,\
tree_method='hist',\
grow_policy='lossguide')
Notice here that we’ve set max_depth=0
, because this hyperparameter is not relevant for the lossguide
policy. Instead, we are going to set a hyperparameter called max_leaves
, which simply controls the maximum number of leaves in the trees that will be grown. We’ll do a hyperparameter search of values ranging from 5 to 100 leaves:
max_leaves_values = list(range(5,105,5))
print(max_leaves_values[:5])
print(max_leaves_values[-5:])
This should output the following:
[5, 10, 15, 20, 25]
[80, 85, 90, 95, 100]
Now we are ready to repeatedly fit and validate the model across this range of hyperparameter values, similar to what we’ve done previously:
%%time
val_aucs = []
for max_leaves in max_leaves_values:
#Set parameter and fit model
xgb_model_3.set_params(**{'max_leaves':max_leaves})
xgb_model_3.fit(X_train, y_train, eval_set=eval_set, eval_metric='auc', verbose=False,\
early_stopping_rounds=30)
#Get validation score
val_set_pred_proba = xgb_model_3.predict_proba(X_val)[:,1]
val_aucs.append(roc_auc_score(y_val, val_set_pred_proba))
The output will include the wall time for all of these fits, which was about 24 seconds in testing. Now let’s put the results in a data frame:
max_leaves_df = pd.DataFrame({'Max leaves':max_leaves_values, 'Validation AUC':val_aucs})
We can visualize how the validation AUC changes with the maximum number of leaves, similar to our visualization of the learning rate:
mpl.rcParams['figure.dpi'] = 400
max_leaves_df.set_index('Max leaves').plot()
This will result in a plot like this:
Get hands-on with 1400+ tech skills courses.