Outliers

This lesson explains what are outliers, why they happen and how to remove them.

What is an outlier? #

Another area of cleaning can be dealing with outliers. First off, how do you define an outlier? This can require domain knowledge as well as other information, but a simple way to start is by taking a look at box plots:

Box Plot of Hours Per Week
Box Plot of Hours Per Week

The above plot was calculated with this command:

bbox = train_df['hoursperweek'].plot(kind="box")

Detection of an outlier #

...
Access this course and 1400+ top-rated courses and projects.