Testing the Titanic Dataset

Explore how to carefully profile the Titanic test dataset to identify missing data and factor level mismatches. Learn to train a tuned random forest model with imputed missing values, prepare the test data for predictions, and generate submission-ready results for Kaggle competitions.

We'll cover the following...

Profiling the Titanic test dataset
Training a model
Preparing the test dataset
Titanic test dataset predictions

#================================================================================================
# Load libraries - supress messages
#
suppressMessages(library(tidyverse))
library(skimr)
#================================================================================================
# Load the Titanic test data
#
titanic_test <- read_csv("titanic_test.csv", show_col_types = FALSE)
#================================================================================================
# Use the skimr package to get a first pass of the data
#
skim(titanic_test)

1.Welcome to the Course

2.Supervised Learning

3.Classification Tree Math

4.Using Classification Trees in R

5.Introducing the Bias-Variance Tradeoff

6.Model Tuning

7.Model Tuning with tidymodels

8.Feature Engineering

9.Regression Trees

10.The Random Forest Algorithm

11.Using Random Forests

12.Gradient Boosting Trees

13.Continuing Your Journey

Project

Testing the Titanic Dataset

Profiling the Titanic test dataset