...

Bootstrap, Bagging, and Random Forests

Learn about Bootstrap, bagging, and random forest.

We'll cover the following...

Bootstrap
Bagging
Random forest

For such a small sample, we expect an error in the mean. However, using the Bootstrap procedure, we can improve the estimate of our mean using the following steps:

Create many random subsets (say 500) of our dataset with replacement—selecting the same value multiple times.
Calculate the mean of each subset.
Calculate the average of our collected means and use that as our estimated mean for the data.

For example, we had five subsets that gave the mean values of 2.5, 3.5, 5.5, 4.3, and 2.9. The estimated mean, in this case, will be 3.74.

Bagging

Bagging or Bootstrap aggregating is an application of Bootstrap. It is a general-purpose procedure for reducing the high variance of machine learning algorithms, typically in decision trees. This compelling and straightforward ensemble method combines the predictions from multiple machine learning algorithms to make more accurate predictions than any individual model. Let’s say we have a sample with 5,000 instances or values. We want to use the decision tree (CART) algorithm. The bagging procedure will work as follows:

Create many random subsets (say 500) of our dataset with replacement.
Train the model on each subset.
For the new dataset, calculate and output the average prediction from each model.

Note: For example, if we have five bagged decision trees, making predictions for classes G, G, B, and B. The most requested (mode) G will be their final prediction.

Decision trees are greedy. They choose the variable (node) to split on using a greedy algorithm that minimizes error. Even with bagging, the decision trees can have many structural similarities ...

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

Bootstrap, Bagging, and Random Forests

Bootstrap

Bagging