Ensemble methods in Python: Max voting
Ensemble methods in machine learning combine the strengths of multiple models for enhanced performance. Max voting, a foundational ensemble technique, involves aggregating predictions from multiple models and selecting the most frequent class as the final prediction. This straightforward yet effective approach leverages the diversity of individual models to enhance overall predictive accuracy.
How to implement max voting using Python
Let’s look at the steps required to implement the max voting algorithm in Python.
Import the libraries
The first step is to import the required libraries.
from sklearn.ensemble import VotingClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.datasets import load_breast_cancerfrom sklearn.metrics import accuracy_score
Load the dataset
The next step is to load the dataset. We will use the breast cancer dataset provided by the sklearn library.
cancer = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)
Define the base models
The next step is to choose the base models. The max voting classifier uses multiple models. We will use RandomForestClassifier and GradientBoostingClassifier for this example.
rf_model = RandomForestClassifier(n_estimators=10, random_state=42)gb_model = GradientBoostingClassifier(n_estimators=10, random_state=42)
Implement max-voting
We will now create an instance for the VotingClassifier and fit the training data to train the model.
max_voting_model = VotingClassifier(estimators=[('rf', rf_model), ('gb', gb_model)], voting='hard')max_voting_model.fit(X_train, y_train)
Predict and evaluate
Now, we will make the predictions on the test set and calculate accuracy.
y_pred = averaging_model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))
Code example
The following code shows the steps outlined above to implement the max voting ensemble classifier in Python:
from sklearn.ensemble import VotingClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.datasets import load_breast_cancerfrom sklearn.metrics import accuracy_score# Load and split the datasetcancer = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)# Define base modelsrf_model = RandomForestClassifier(n_estimators=10, random_state=42)gb_model = GradientBoostingClassifier(n_estimators=10, random_state=42)# Create an ensemble using max votingmax_voting_model = VotingClassifier(estimators=[('rf', rf_model), ('gb', gb_model)], voting='hard')max_voting_model.fit(X_train, y_train)# Predict and evaluatey_pred = max_voting_model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))
Code explanation
Lines 1–5: These lines import the required libraries.
Line 8: This line loads the breast cancer dataset from
sklearnand stores it in thecancervariable.Line 9: This line splits the dataset into train and test.
Lines 12–13: We define
RandomForestClassifierandGradientBoostingClassifieras the base models for theVotingClassifier.Lines 16–17: Here, we create a
VotingClassifierwith specified base models.Line 20: The trained model is used to make predictions on the test data.
Lines 21–22: The code calculates the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.
Unlock your potential: Ensemble learning series, all in one place!
If you've missed any part of the series, you can always go back and check out the previous Answers:
What is ensemble learning?
Understand the concept of combining multiple models to improve predictions.Ensemble methods in Python: Averaging
Learn how averaging methods can boost model accuracy and stability.Ensemble methods in Python: Bagging
Discover the power of bagging in reducing variance and enhancing prediction performance.Ensemble methods in Python: Boosting
Dive into boosting techniques that improve weak models by focusing on mistakes.Ensemble methods in Python: Stacking
Understand how stacking combines multiple models to make better predictions.Ensemble methods in Python: Max voting
Explore the max voting method to combine classifier predictions and increase accuracy.
Free Resources