Tuning a Classification Tree

Build on cross-validation by learning to tune a CART classification decision tree using tidymodels.

Preparing the data

The following code prepares the Titanic training data as part of a tidymodels workflow.

Press + to interact
#================================================================================================
# Load libraries - suppress messages
#
suppressMessages(library(tidyverse))
suppressMessages(library(tidymodels))
suppressMessages(library(rattle))
#================================================================================================
# Load the Titanic training data and transform Embarked to a factor
#
titanic_train <- read_csv("titanic_train.csv", show_col_types = FALSE) %>%
mutate(Sex = factor(Sex),
Embarked = factor(case_when(
Embarked == "C" ~ "Cherbourg",
Embarked == "Q" ~ "Queenstown",
Embarked == "S" ~ "Southampton",
is.na(Embarked) ~ "missing")))
#================================================================================================
# Craft the recipe - recipes package
#
titanic_recipe <- recipe(Survived ~ Sex + Pclass + SibSp + Parch + Fare + Embarked, data = titanic_train) %>%
step_num2factor(Survived,
transform = function(x) x + 1,
levels = c("perished", "survived")) %>%
step_num2factor(Pclass,
levels = c("first", "second", "third"))

Configuring the model for tuning

The parsnip package supports a variety of methods for tuning hyperparametersA parameter whose value is used to control the learning process.. Configuration of a tidymodels workflow for hyperparameter tuning occurs when the algorithm is specified. The parsnip package’s decision_tree() function supports tuning of the following hyperparameters:

  • cost_complexity: A positive number for the cost/complexity parameter (aka cp) used by the rpart package.

  • tree_depth: A positive integer for the maximum depth of the tree. ...