...

/

Solution Review: Cleaning NYC Property Sales

Solution Review: Cleaning NYC Property Sales

This lesson provides the solutions to the data cleaning exercise in the previous lesson.

1. Change values

In this task we had to change the values in the BOROUGH column according to the following rule:

1 --> Manhattan

2 --> Bronx

3 --> Brooklyn

4 --> Queens

5 --> Staten Island

Press + to interact
import pandas as pd
df = pd.read_csv('nyc_property_sales.csv')
# 1 --> Manhattan
condition = df['BOROUGH'] == 1
df.loc[condition,'BOROUGH'] = 'Manhattan'
# 2 --> Bronx
condition = df['BOROUGH'] == 2
df.loc[condition,'BOROUGH'] = 'Bronx'
# 3 --> Brooklyn
condition = df['BOROUGH'] == 3
df.loc[condition,'BOROUGH'] = 'Brooklyn'
# 4 --> Queens
condition = df['BOROUGH'] == 4
df.loc[condition,'BOROUGH'] = 'Queens'
# 5 --> Staten Island
condition = df['BOROUGH'] == 5
df.loc[condition,'BOROUGH'] = 'Staten Island'
print(df['BOROUGH'].unique())

By looking at the problem statement, we can see that we need to write similar code for all 55 categories. We do each category one by one.

To change all instances of a specific value in a column, first, we need to find the rows where that value is present. To do this, we write our condition in line 4. df['BOROUGH'] == 1, gives us a list of True/False against each row. It is true for rows where the value of the BOROUGH column is 1 ...