Search⌘ K
AI Features

Challenge Solution Review

Understand how to apply Gradient Boosting Classifier on the breast cancer dataset, split data for testing, and refine model performance through GridSearchCV parameter tuning optimizing for F1-score. Learn to identify the best parameters and scores for improved tree-based ensemble models.

We'll cover the following...
Python 3.5
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier
X, y = datasets.load_breast_cancer(return_X_y=True)
train_x, test_x, train_y, test_y = train_test_split(X,
y,
test_size=0.2,
random_state=42)
gb = GradientBoostingClassifier(random_state=10)
param_grid = [{
"n_estimators": [1, 2, 4, 16, 32],
"learning_rate": [0.05, 0.1, 0.2, 0.4],
"min_samples_leaf": [1, 2, 4, 8],
}]
cv = GridSearchCV(gb, param_grid=param_grid, scoring="f1", n_jobs=4)
cv.fit(train_x, train_y)
print("The best F1-score is {}.".format(cv.best_score_))
print("The parameter of best estimator is {}.".format(cv.best_params_))

First, we use load_breast_cancer to load the breast cancer dataset at line 5. We split it into two parts at line 7, where the test set accounts for 20%.

A ...