Predictive Data Analysis with Python/

...

/

Finding Outliers in Data

Finding Outliers in Data

In this lesson, an explanation is provided on what outliers in data are and how to detect them.

We'll cover the following...

- What is an outlier?
- - Why do outliers exist?
  - Identifying outliers
- - - Interquartile range (IQR) method
- Dealing with outliers

Why do outliers exist?

An outlier in any dataset mostly exists for the following two reasons:

Variance in data: There can always be anomalies and ambiguities in data, which can be quite different from the normal distribution.
Entry error: This occurs mainly due to human error while preparing the dataset or entering values.

Identifying outliers

There are two main methods used to identify outliers in any dataset:

Visualization plots: The outliers are clearly visible if we plot the data in a scatter, box, or histogram plot, as they are away from the center of the data. More about this will be

...