Preparing the Test Dataset

Explore the process of preparing test datasets in R for random forest models. Learn to split data using stratified sampling, ensure factor levels match training data, and perform necessary transformations to maintain consistency for accurate model evaluation.

We'll cover the following...

Splitting the data
Transforming the data

Splitting the data

The first step of any machine learning project is splitting the data into training and test datasets. The training dataset is used throughout crafting machine learning models, including exploratory data analysis (EDA), feature engineering, training, and tuning. The test dataset is used at the end of the project as the final test of a machine learning model’s prediction quality.

The rsample package offers the initial_split(), training(), and testing() functions for splitting data. The following code demonstrates using ...

1.Welcome to the Course

2.Supervised Learning

3.Classification Tree Math

4.Using Classification Trees in R

5.Introducing the Bias-Variance Tradeoff

6.Model Tuning

7.Model Tuning with tidymodels

8.Feature Engineering

9.Regression Trees

10.The Random Forest Algorithm

11.Using Random Forests

12.Gradient Boosting Trees

13.Continuing Your Journey

Project

Preparing the Test Dataset

Splitting the data