Exercise: Randomized Grid Search to Tune XGBoost Hyperparameters
Learn how to perform a randomized grid search to explore a large hyperparameter space in XGBoost.
We'll cover the following
XGBoost for randomized grid search
In this exercise, we’ll use a randomized grid search to explore the space of six hyperparameters. A randomized grid search is a good option when you have many values of many hyperparameters you’d like to search over. We’ll look at six hyperparameters here. If, for example, there were five values for each of these that we’d like to test, we’d need searches. Even if each model fit only took a second, we’d still need several hours to exhaustively search all possible combinations. A randomized grid search can achieve satisfactory results by only searching a random sample of all these combinations. Here, we’ll show how to do this using scikit-learn and XGBoost.
The first step in a randomized grid search is to specify the range of values you’d like to sample from, for each hyperparameter. This can be done by either supplying a list of values, or a distribution object to sample from. In the case of discrete hyperparameters such as max_depth
, where there are only a few possible values, it makes sense to specify them as a list. On the other hand, for continuous hyperparameters, such as subsample, that can vary anywhere on the interval (0, 1], we don’t need to specify a list of values. Rather, we can ask that the grid search randomly sample values in a uniform way over this interval. We will use a uniform distribution to sample several of the hyperparameters we consider:
-
Import the
uniform
distribution class fromscipy
and specify ranges for all hyperparameters to be searched, using a dictionary.uniform
can take two arguments,loc
andscale
, specifying the lower bound of the interval to sample from and the width of the interval, respectively:from scipy.stats import uniform param_grid = {'max_depth':[2,3,4,5,6,7], 'gamma':uniform(loc=0.0, scale=3),\ 'min_child_weight':list(range(1,151)), 'colsample_bytree':uniform(loc=0.1, scale=0.9),\ 'subsample':uniform(loc=0.5, scale=0.5), 'learning_rate':uniform(loc=0.01, scale=0.5)}
Here, we’ve selected parameter ranges based on experimentation and experience. For example with subsample, the XGBoost documentation recommends choosing values of at least 0.5, so we’ve indicated
uniform(loc=0.5, scale=0.5)
, which means sampling from the interval [0.5, 1]. -
Now that we’ve indicated which distributions to sample from, we need to do the sampling. Scikit-learn offers the
ParameterSampler
class, which will randomly sample theparam_grid
parameters supplied and return as many samples as requested (n_iter
). We also setRandomState
for repeatable results across different runs of the notebook:from sklearn.model_selection import ParameterSampler rng = np.random.RandomState(0) n_iter=300 param_list = list(ParameterSampler(param_grid, n_iter=n_iter, random_state=rng))
We have returned the results in a list of dictionaries of specific parameter values, corresponding to locations in the 6-dimensional hyperparameter space.
Note that in this exercise, we are iterating through 1,000 hyperparameter combinations, which will likely take over 5 minutes. You may wish to decrease this number for faster results.
-
Examine the first item of
param_list
:param_list[0]
This should return a combination of six parameter values, from the distributions indicated:
{'colsample_bytree': 0.5939321535345923, 'gamma': 2.1455680991172583, 'learning_rate': 0.31138168803582195, 'max_depth': 5, 'min_child_weight': 104, 'subsample': 0.7118273996694524}
-
Observe how you can set multiple XGBoost hyperparameters simultaneously with a dictionary, using the
**
syntax. First create a new XGBoost classifier object for this exercise.xgb_model_2 = xgb.XGBClassifier( n_estimators=1000, verbosity=1, use_label_encoder=False,\ objective='binary:logistic') xgb_model_2.set_params(**param_list[0])
The output should show the indicated hyperparameters being set:
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1,\ colsample_bytree=0.5939321535345923, gamma=2.1455680991172583, gpu_id=-1, importance_type='gain',\ interaction_constraints='', learning_rate=0.31138168803582195, max_delta_step=0, min_child_weight=104,\ missing=nan, monotone_constraints='()', n_estimators=1000, n_jobs=4, num_parallel_tree=1,\ random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=0.7118273996694524,\ tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=1)
We will use this procedure in a loop to look at all hyperparameter values.
-
The next several steps will be contained in one cell inside a
for
loop. First, measure the time it will take to do this, create an empty list to save validation AUCs, and then start a counter:%%time val_aucs = [] counter = 1
-
Open the
for
loop, set the hyperparameters, and fit the XGBoost model, similar to the preceding example of tuning the learning rate:for params in param_list: #Set hyperparameters and fit model xgb_model_2.set_params(**params) xgb_model_2.fit(X_train, y_train, eval_set=eval_set, eval_metric='auc', verbose=False,\ early_stopping_rounds=30)
-
Within the for loop, get the predicted probability and validation set AUC:
#Get predicted probabilities and save validation ROC AUC val_set_pred_proba = xgb_model_2.predict_proba(X_val)[:,1] val_aucs.append(roc_auc_score(y_val, val_set_pred_proba))
-
Because this procedure will take a few minutes, it’s nice to print the progress to the Jupyter Notebook output. We use the Python remainder syntax,
%
, to print a message every 50 iterations, in other words, when the remainder of counter divided by 50 equals zero. Finally, we increment thecounter
:#Print progress if counter % 50 == 0: print('Done with {counter} of {n_iter}'.format( counter=counter, n_iter=n_iter)) counter += 1
-
Assembling steps 5-8 in one cell and running the for loop should give output like this:
Done with 50 of 1000 Done with 100 of 1000 … Done with 950 of 1000 Done with 1000 of 1000 CPU times: user 24min 20s, sys: 18.9 s, total: 24min 39s Wall time: 6min 27s
-
Now that we have all the results from our hyperparameter exploration, we need to examine them. We can easily put all the hyperparameter combinations in a data frame, because they are organized as a list of dictionaries. Do this and look at the first few rows:
xgb_param_search_df = pd.DataFrame(param_list) xgb_param_search_df.head()
The output should look like this:
Get hands-on with 1300+ tech skills courses.