Missing Value

In this lesson, let's see how to deal with missing values in sklearn.

Why missing values are important

Missing values are very common in real datasets. For different reasons, the datasets contain missing values as blank, nan, inf, or other specified values. In some cases, some normal values are also considered to be a missing value, such as 0 or 1. Why do we care about the missing values?

  1. Some algorithms or some implementations can’t deal with the missing values. They assume the dataset is complete.
  2. The missing values would impact the performance of our model.

In most cases, the first is the main reason.

In some cases, you may think about just dropping the rows or columns with too many missing values. It’s a good idea if only a small part of the data is dropped. However, when the dropped data is large, it may bring some other issues. For example, if you drop the whole column, it leads to the loss of information. Another way around this is to impute it. sklearn provides some functions for missing value imputation.

Get hands-on with 1400+ tech skills courses.