Exercise: Verifying Basic Data Integrity
Learn how to verify the integrity of the data using basic pandas functions.
We'll cover the following...
In this exercise, we will perform a basic check on whether our dataset contains what we expect and verify whether there is the correct number of samples.
Data consistency assessment
The data is supposed to have observations for 30,000 credit accounts. While there are 30,000 rows, we should also check whether there are 30,000 unique account IDs. It’s possible that, if the SQL query used to generate the data was run on an unfamiliar schema, values that are supposed to be unique are in fact not unique. To examine this, we can check if the number of unique account IDs is the same as the number of rows. Perform the following steps to complete the exercise:
-
Import pandas, load the data, and examine the column names by running the following command in a cell, using “Shift + Enter”:
import pandas as pd df =