...

/

Checking Data—The skimr Package

Checking Data—The skimr Package

Learn to use skimr to efficiently check input data quality.

It’s critical to do quick checks on datasets before using them. In practice, it’s common to pull a dataset from a database and later discover that the data was of poor quality. There might be missing values, some fields might be mislabelled, or some data values don’t make sense. Whether we make that discovery early or late in a project can enormously impact the project’s outcome. As a result, we need to check datasets before we leverage them. That is, we need to make sure that the datasets we pull are:

  • What we think they are: We haven’t grabbed the wrong database table, and the data we’re looking at isn’t mislabelled.
  • High quality: Fields are populated appropriately, with no unexpected missing data, and without unexpected values.

Press + to interact
The skimr logo
The skimr logo

Because of those needs, it’s crucial to have an efficient way to get an overview of the data before diving into analysis. This is where the skimr package comes in. This is a powerful and flexible tool for quickly summarizing and visualizing datasets in R. It’s similar to using the summary function, which we get straight from base-R, but provides extra detail and ...