Dealing With Outliers in R
Learn how to identify and deal with outliers using z-score and interquartile range tests.
Outliers and outlier detection
Outliers are observations that are significantly different from the rest of the data. They can occur for various reasons, including errors in data entry and measurement, as well as the incidence of rare or unusual events. Identifying and investigating outliers as part of our data analysis is generally a good idea since they can impact the results significantly. However, we should be careful not to just blindly remove or ignore outliers since they may represent important information that we should take into consideration.
There are various ways to detect outliers in data. Some of them are listed below:
- Visualizing data and checking for marginal data points
- Z-score test to detect inorganic values in data
- Interquartile range (IQR) test
Z-score test
As we saw earlier, z-score transformation shows how much the values in a dataset are dispersed from the mean. It converts values into standard deviation coefficients. We utilize this normalization technique in outlier detection as well.
According to the empirical rule of statistics, 99.7% of the data is within the 3 standard deviation ...
Get hands-on with 1400+ tech skills courses.