Ensemble Learning Part 1

In this lesson, you’ll learn Ensemble Learning which combines several models to improve the predictions as compared to a single model.

Ensemble Learning

Ensemble Learning involves methods that combine the decisions of several models to improve the overall performance of the predictive models. Kaggle competition winners commonly use Ensem

Family of Ensemble Models

Voting Classifier

In the voting classifiers ensemble, multiple models such as Logistic Regression and support vector machines predict the label for an instance. and this prediction is taken as a vote for the label. The label predicted by the majority of the classifiers has the maximum votes and is used as the final label.

Hard Voting

In hard voting, the final label is decided based on the label predicted by the majority of the models. Suppose we have three models and the following predictions.

  • Model 1 predicts Class 1
  • Model 2 predicts Class 1
  • Model 3 predicts Class 2

Then hard voting would select the final class label to be 1. In case of a tie, hard voting selects the label based on the ascending vote order. If we have 2 models

  • Model 1 predicts Class 2
  • Model 2 predicts Class 1

Then, Hard Voting will assign the Class Label to be 1.

Soft Voting

In soft voting, we follow the steps below.

  • We assign a weight to each of the models.
  • We take the predicted class probabilities from each model.
  • Then, we multiply the predicted class probabilities with the model weight, and take an average of all the products.
  • The final class label is chosen with the highest average probability. Let’s assume we have four three-class classifiers where we assign weights to each classifier in the following manner

w1=2w_1 = 2, w2=3w_2 = 3, w3=4w_3=4, w4=1w_4=1. The weighted average probabilities for an instance would then be calculated as follows.

Model Class 1 Class 2 Class 3
Model 1 w1 * 0.2 w1 * 0.5 w1 * 0.3
Model 2 w2 * 0.6 w2 * 0.3 w2 * 0.1
Model 3 w3 * 0.3 w3 * 0.4 w3 * 0.3
Model 4 w4 * 0.2 w4 * 0.2 w4 * 0.6
Weighted Average 0.9 0.925 0.645

Note that 0.20.2 in w10.2w_1 * 0.2 in the first row is the predicted probability by Model 1 for Class 1. The other values can be inferred in the same way. For class 1 the weighted average is calculated as below.

weighted_average_for_class_1weighted\_average\_for\_class\_1 = (20.2)+(30.6)+(40.3)+(10.2)4\frac{(2*0.2)+(3*0.6)+(4*0.3)+(1*0.2)}{4} = 0.90.9

In this example, the final predicted class label using soft voting is 2 since class 2 has the highest probability.

Voting Regressor

Voting regressor is used for regression problems. In voting regressor, real-valued outputs from different models are averaged to compute the final result which is returned.

Stacking or Stacked Generalization

This ensemble technique is used both for regression and classification problems. In stacking the idea is to train several models using the same training dataset. These different models are also called the Base Learners. These trained models are then aggregated using a final model called as Meta-Learner to make the predictions. The input for the Meta Learners is the prediction output of the base models.

Get hands-on with 1400+ tech skills courses.