Solution Review: Cleaning NYC Property Sales
This lesson provides the solutions to the data cleaning exercise in the previous lesson.
We'll cover the following...
1. Change values
In this task we had to change the values in the BOROUGH
column according to the following rule:
1 --> Manhattan
2 --> Bronx
3 --> Brooklyn
4 --> Queens
5 --> Staten Island
Press + to interact
import pandas as pddf = pd.read_csv('nyc_property_sales.csv')# 1 --> Manhattancondition = df['BOROUGH'] == 1df.loc[condition,'BOROUGH'] = 'Manhattan'# 2 --> Bronxcondition = df['BOROUGH'] == 2df.loc[condition,'BOROUGH'] = 'Bronx'# 3 --> Brooklyncondition = df['BOROUGH'] == 3df.loc[condition,'BOROUGH'] = 'Brooklyn'# 4 --> Queenscondition = df['BOROUGH'] == 4df.loc[condition,'BOROUGH'] = 'Queens'# 5 --> Staten Islandcondition = df['BOROUGH'] == 5df.loc[condition,'BOROUGH'] = 'Staten Island'print(df['BOROUGH'].unique())
By looking at the problem statement, we can see that we need to write similar code for all categories. We do each category one by one.
To change all instances of a specific value in a column, first, we need to find the rows where that value is present. To do this, we write our condition in line 4. df['BOROUGH'] == 1
, gives us a list of True
/False
against each row. It is true for rows where the value of the BOROUGH
column is ...