Ensemble methods in Python: Bagging

Ensemble methods in machine learning leverage the power of combining multiple models to enhance overall performance. This approach is particularly effective when individual models may have limitations or biases. One prominent ensemble technique is Bagging (also known as bootstrap aggregating).

Bagging aims to reduce overfitting and varianceVariance in machine learning refers to the sensitivity of a model to fluctuations in the training data, indicating how much the model's predictions vary with different training sets. by training multiple instances of a base model on different subsets of the training data. Each subset is obtained through bootstrap samplingBootstrap sampling is a statistical technique where subsets of a dataset are repeatedly drawn with replacement, allowing for the estimation of the variability and uncertainty associated with a sample statistic or model parameters., randomly selecting data points with replacement. The final prediction is often an average or a vote from the individual models.

from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
# Load and split the data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=10)  # Changed random_state
# Use RandomForestClassifier with max_features='sqrt' for randomness
base_model = RandomForestClassifier(n_estimators=10, max_depth=3, max_features='sqrt', random_state=42)  # We can adjust hyperparameters
# Implement bagging with different base models for diversity
bagging_model = BaggingClassifier(base_model, n_estimators=50, random_state=20)
bagging_model.fit(X_train, y_train)
# Predict and evaluate
y_pred = bagging_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Explanation:

Lines 1–4: These lines import the required libraries.
Line 7: This line loads the dataset from sklearn and stores it in the data variable.
Line 8: This line splits the dataset into train and test.
Line 11: We define RandomForestClassifier as this line’s base model for bagging.
Lines 14--15: Here, we create a BaggingClassifier with 50 base models and fit the bagging model on the training data. The BaggingClassifier handles the bootstrap sampling internally when fitting the model.
Line 18: The trained model is used to make predictions on the test data.
Lines 19–20: The code calculates the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

Ensemble methods in Python: Bagging

How to implement bagging using Python

1. Import the libraries

2. Load the dataset

3. Define the base model

4. Implement bagging

5. Predict and evaluate

Example

Explanation: