Ensembles: Bagging vs Boosting

This lesson will focus on how to use boosting and bagging machine learning algorithms in Python.

Ensembles

Recall that in the lesson Random Forests, we learned that an ensemble is a collection of different predictive models that collectively decide the predicted output. Ensemble methods are divided into two categories:

  • Bagging
  • Boosting

Bagging

In bagging, each individual model randomly samples from the training data with replacement. This means each model is different.

Note that we do not train individual models on random subsets of the data; rather, they are trained on the whole data set, but each training example is randomly sampled with replacement. For instance if our training data has 6 numbers such as [1,2,3,4,5,6] and we sample 6 times with replacement, we might get [1,2,2,4,5,5]. Therefore each individual model is different.

In Bagging, the result is obtained by averaging the responses of the N models or majority vote.

Get hands-on with 1400+ tech skills courses.