Managing Missing Data

Learn the common techniques for managing missing data effectively in pandas.

Introduction

Managing missing data effectively is a fundamental aspect of data preprocessing in data science. We learned earlier that NaN values tend to be propagated in pandas objects during calculations. While this propagation feature can be desirable, we often need to manipulate these NaN values to achieve accurate and meaningful analysis.

In this lesson, we’ll look at four techniques for managing and remedying missing data:

  • Filling

  • Replacing

  • Interpolating

  • Dropping

For this lesson, we’ll be using a mock transaction dataset of an e-commerce business, as shown below:

Get hands-on with 1200+ tech skills courses.