Converting to Tidy Data
Learn how to covert data into tidy format.
We'll cover the following...
We’ve only seen data frames that were already in tidy format. Furthermore, we’ll mostly only see data frames that are already in tidy format as well. However, this isn’t always the case with all datasets in the world. If our original data frame is in wide (non-tidy) format and we’d like to use the ggplot2
or dplyr
packages, we’ll first have to convert it to tidy format. We recommend using the pivot_longer()
function in the tidyr
package (Wickham and Henry, 2019).
Going back to our drinks_smaller
data frame from earlier:
drinks_smaller
We convert it to tidy format by using the pivot_longer()
function from the tidyr
package, as follows:
drinks_smaller_tidy <- drinks_smaller %>%pivot_longer(names_to = "type",values_to = "servings",cols = -country)drinks_smaller_tidy
We set the arguments to pivot_longer()
as follows:
names_to
: This corresponds to the variable’s name in the new tidy data frame that will contain the column names of the original data. Observe how we setnames_to = "type"
. In the resultingdrinks_smaller_tidy,
the column contains the three types of alcohol,beer
,spirit
, andwine
. Sincetype
is a variable name that doesn’t appear indrinks_smaller
, we use quotation marks around it. We’ll receive an error if we usenames_to = type
here.values_to
: This is the variable’s name in the new tidy data frame that will contain the ...