How to handle missing values in Pandas DataFrame

Overview

In this shot, we’ll learn about the missing data values in the datasets. The real-world data is never clean and contains many missing values. This usually happens due to many reasons.

Reasons for missing values

Some of the common reasons could be:

Human errors while joining multiple tables.
We could skip some data values.
The data generating source sometimes behave abnormally.

Hence, there is a need to handle the missing values while building machine learning models or performing data analytics.

Different approach

We’ll discuss two approaches to handle the missing values:

Delete the records having missing data

In this approach, we delete the entire record or row with missing values. Let’s understand this with an example:

Explanation

Line 1: we import the required package.
Line 3: we read the dataset using the read_csv() method.
Line 6: We print the dataset. It can be observed in the output that few records contain NaN values, that is, missing values.
Line 9: We use the dropna() method on the dataset. This method deletes the records which have missing values. We also pass the parameter inplace to be True so that the records are deleted and the new dataset is updated in the same variable.
Line 10: We print the dataset. Now, no records have missing values.

Let’s now explore the second way to handle the missing values:

Replace the missing values with a default value

In this approach, we replace the missing values with a default value. In this case, no records are deleted. Let’s understand this with an example:

Explanation

The code is almost the same as above, with just one difference.

Lines 10 to 12: We use the fillna() method to fill the missing values with a default value.

Here, we have filled a value of 45000, Google, and Australia for the columns Salary, Company, and Country, respectively.

Also, a parameter passed as inplace to be True so that the operation is performed in the same dataset and the result is stored back in the same variable.

Line 14: We can observe that the missing values are now filled by the specified values using the fillna() method.

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design