Outlier Detection and Removal
This lesson will focus on how to detect outliers in the data and what to do with them.
We'll cover the following...
Outlier detection
Detecting outliers is a very important step in data cleaning and exploring. It gives us an idea of the anomalies in the data which can give us valuable insights into the data. So, how can we detect outliers?
Outliers can be detected both visually and mathematically. Some plots are very helpful in visualizing outliers, such as box plots and scatter plots. However, it is sometimes tricky to decide whether or not to remove the outliers. We should remove outliers when we are certain that these outliers were results of some errors.
We will discuss some of the methods to detect and remove outliers. We will be using the Sample Sales Data. The data is in the file sales_data.csv.
Box plots and Quantile ranges
Box plots, by definition, plot outliers as points and group the rest of the observations. The criteria of a box plot for classifying a point as an outlier is if the point is greater than ...