Exercise: Data Cleaning
Practice handling missing and duplicate values in customer-related data.
We'll cover the following...
In this exercise, we’re given a dataset about the company’s customers. Unfortunately, the data is not clean. There are a few missing and duplicate values. Our task is to clean the data and prepare it for further analysis. Let’s take a look at the data.
Press + to interact
+---------+----+---+------+------+------------+|id_number|name|age|gender|income|credit_score|+---------+----+---+------+------+------------+| 135245|John| 25| M| 50000| null|| 223452|Jane| 30| F| 60000| 700|| 341412| Joe| 35| M| 70000| 800|| 341412| Joe| 35| M| 70000| 800|| 574355|Jill| 40| F| 80000| 900|| 253774|Jack| 45| M| 90000| 1000|| 856585|Jack| 75| M| null| 300|+---------+----+---+------+------+------------+
We can see that there are a few problems with the data: