Converting to Tidy Data

Learn how to covert data into tidy format.

We'll cover the following...

We’ve only seen data frames that were already in tidy format. Furthermore, we’ll mostly only see data frames that are already in tidy format as well. However, this isn’t always the case with all datasets in the world. If our original data frame is in wide (non-tidy) format and we’d like to use the ggplot2 or dplyr packages, we’ll first have to convert it to tidy format. We recommend using the pivot_longer() function in the tidyr package (Wickham and Henry, 2019).

Going back to our drinks_smaller data frame from earlier:

Press + to interact
drinks_smaller

We convert it to tidy format by using the pivot_longer() function from the tidyr package, as follows:

Press + to interact
drinks_smaller_tidy <- drinks_smaller %>%
pivot_longer(names_to = "type",
values_to = "servings",
cols = -country)
drinks_smaller_tidy

We set the arguments to pivot_longer() as follows:

  1. names_to: This corresponds to the variable’s name in the new tidy data frame that will contain the column names of the original data. Observe how we set names_to = "type". In the resulting drinks_smaller_tidy, the column contains the three types of alcohol, beer, spirit, and wine. Since type is a variable name that doesn’t appear in drinks_smaller, we use quotation marks around it. We’ll receive an error if we use names_to = type here.

  2. values_to: This is the variable’s name in the new tidy data frame that will contain the ...