Ensembles: Bagging vs Boosting
This lesson will focus on how to use boosting and bagging machine learning algorithms in Python.
Ensembles
Recall that in the lesson Random Forests, we learned that an ensemble is a collection of different predictive models that collectively decide the predicted output. Ensemble methods are divided into two categories:
- Bagging
- Boosting
Bagging
In bagging, each individual model randomly samples from the training data with replacement. This means each model is different.
Note that we do not train individual models on random subsets of the data; rather, they are trained on the whole data set, but each training example is randomly sampled with replacement. For instance if our training data has 6 numbers such as [1,2,3,4,5,6] and we sample 6 times with replacement, we might get [1,2,2,4,5,5]. Therefore each individual model is different.
In Bagging, the result is obtained by averaging the responses of the N models or majority vote.
Boosting
In boosting, each individual model samples from the data, but this time, the sampling is not random, rather it is weighted. This means that in contrast to bagging, where each example in the training set has an equal chance of being chosen for training, in boosting, different samples have different chances of being chosen.
Therefore, individual models are trained on different training sets, and hence, different models are obtained.
In Boosting algorithms, each model is trained on data, ...