Ensemble methods in machine learning combine the strengths of multiple models for enhanced performance. Max voting, a foundational ensemble technique, involves aggregating predictions from multiple models and selecting the most frequent class as the final prediction. This straightforward yet effective approach leverages the diversity of individual models to enhance overall predictive accuracy.
Let’s look at the steps required to implement the max voting algorithm in Python.
The first step is to import the required libraries.
from sklearn.ensemble import VotingClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.datasets import load_breast_cancerfrom sklearn.metrics import accuracy_score
The next step is to load the dataset. We will use the breast cancer dataset provided by the sklearn
library.
cancer = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)
The next step is to choose the base models. The max voting classifier uses multiple models. We will use RandomForestClassifier
and GradientBoostingClassifier
for this example.
rf_model = RandomForestClassifier(n_estimators=10, random_state=42)gb_model = GradientBoostingClassifier(n_estimators=10, random_state=42)
We will now create an instance for the VotingClassifier
and fit the training data to train the model.
max_voting_model = VotingClassifier(estimators=[('rf', rf_model), ('gb', gb_model)], voting='hard')max_voting_model.fit(X_train, y_train)
Now, we will make the predictions on the test set and calculate accuracy.
y_pred = averaging_model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))
The following code shows the steps outlined above to implement the max voting ensemble classifier in Python:
from sklearn.ensemble import VotingClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.datasets import load_breast_cancerfrom sklearn.metrics import accuracy_score# Load and split the datasetcancer = load_breast_cancer()X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)# Define base modelsrf_model = RandomForestClassifier(n_estimators=10, random_state=42)gb_model = GradientBoostingClassifier(n_estimators=10, random_state=42)# Create an ensemble using max votingmax_voting_model = VotingClassifier(estimators=[('rf', rf_model), ('gb', gb_model)], voting='hard')max_voting_model.fit(X_train, y_train)# Predict and evaluatey_pred = max_voting_model.predict(X_test)accuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))
Lines 1–5: These lines import the required libraries.
Line 8: This line loads the breast cancer dataset from sklearn
and stores it in the cancer
variable.
Line 9: This line splits the dataset into train and test.
Lines 12–13: We define RandomForestClassifier
and GradientBoostingClassifier
as the base models for the VotingClassifier.
Lines 16–17: Here, we create a VotingClassifier
with specified base models.
Line 20: The trained model is used to make predictions on the test data.
Lines 21–22: The code calculates the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.
Free Resources