Causes of missing data

Missing data is a common occurrence when applying machine learning to business data. While there are many reasons for missing data, the following are the most common:

The data is collected via a manual process and is prone to errors (e.g., data being tracked in a spreadsheet).
Multiple datasets are joined together (e.g., joining database tables can produce missing values).
A particular feature is considered optional in the data source (e.g., an IT system).
Datasets are acquired from external sources (e.g., datasets acquired from governments often have missing values).

Missing data is so common, and strategies for dealing with missing data are critical for crafting the most valuable machine learning models.

Dealing with missing data

When dealing with missing data, there are six basic strategies:

Fix ...

Welcome to the Course

Supervised Learning

Classification Tree Math

Using Classification Trees in R

Introducing the Bias-Variance Tradeoff

Model Tuning

Model Tuning with tidymodels

Feature Engineering

Regression Trees

The Random Forest Algorithm

Using Random Forests

Gradient Boosting Trees

Continuing Your Journey

Credit Card Fraud Detection using the R Language

Missing Data

Causes of missing data

Dealing with missing data