Filling In Data
Fill in missing data for features that only have a few missing values.
We'll cover the following...
Chapter Goals:
- Find the rows that contain missing values for
'CPI'
and'Unemployment'
- Fill in the missing values using previous row values
A. Finding the missing values
We previously noted that both the 'CPI'
and 'Unemployment'
features contain 585 missing values. We’ll find the row indexes containing these missing values by first converting the feature columns in the na_values
boolean DataFrame to integers, i.e. 0 and 1.
We then use the nonzero
function to find the locations of the 1’s, which correspond to the True
values.
Press + to interact
import numpy as np # NumPy libraryna_cpi_int = na_values['CPI'].astype(int)na_indexes_cpi = na_cpi_int.to_numpy().nonzero()[0]na_une_int = na_values['Unemployment'].astype(int)na_indexes_une = na_une_int.to_numpy().nonzero()[0]print(np.array_equal(na_indexes_cpi, na_indexes_une))
The row indexes are located in the na_indexes_cpi
and na_indexes_une
NumPy arrays, which you can see contain the exact same row indexes (sorted in ascending order). Now let’s take a closer look at the exact rows that contain ...
Access this course and 1400+ top-rated courses and projects.