Filling In Data

Fill in missing data for features that only have a few missing values.

Chapter Goals:

  • Find the rows that contain missing values for 'CPI' and 'Unemployment'
  • Fill in the missing values using previous row values

A. Finding the missing values

We previously noted that both the 'CPI' and 'Unemployment' features contain 585 missing values. We’ll find the row indexes containing these missing values by first converting the feature columns in the na_values boolean DataFrame to integers, i.e. 0 and 1.

We then use the nonzero function to find the locations of the 1’s, which correspond to the True values.

Press + to interact
import numpy as np # NumPy library
na_cpi_int = na_values['CPI'].astype(int)
na_indexes_cpi = na_cpi_int.to_numpy().nonzero()[0]
na_une_int = na_values['Unemployment'].astype(int)
na_indexes_une = na_une_int.to_numpy().nonzero()[0]
print(np.array_equal(na_indexes_cpi, na_indexes_une))

The row indexes are located in the na_indexes_cpi and na_indexes_une NumPy arrays, which you can see contain the exact same row indexes (sorted in ascending order). Now let’s take a closer look at the exact rows that contain ...

Access this course and 1400+ top-rated courses and projects.