Checking Data—The skimr Package

Learn to use skimr to efficiently check input data quality.

It’s critical to do quick checks on datasets before using them. In practice, it’s common to pull a dataset from a database and later discover that the data was of poor quality. There might be missing values, some fields might be mislabelled, or some data values don’t make sense. Whether we make that discovery early or late in a project can enormously impact the project’s outcome. As a result, we need to check datasets before we leverage them. That is, we need to make sure that the datasets we pull are:

  • What we think they are: We haven’t grabbed the wrong database table, and the data we’re looking at isn’t mislabelled.
  • High quality: Fields are populated appropriately, with no unexpected missing data, and without unexpected values.

Get hands-on with 1200+ tech skills courses.