Outliers
This lesson explains what are outliers, why they happen and how to remove them.
We'll cover the following...
What is an outlier? #
Another area of cleaning can be dealing with outliers. First off, how do you define an outlier? This can require domain knowledge as well as other information, but a simple way to start is by taking a look at box plots:
Box Plot of Hours Per Week
The above plot was calculated with this command:
bbox = train_df['hoursperweek'].plot(kind="box")
Detection of an outlier #
Here, anything outside the “whiskers” could be considered an outlier. As a refresher, the “whiskers” are the lines sticking out from the box and are 1.5 times the interquartile range. ...