Anomaly Detection with PyCaret
In anomaly detection tasks, learn how to import necessary libraries and load datasets in PyCaret.
We'll cover the following
Anomaly detection is one of the main tasks in unsupervised machine learning. Its goal is to identify dataset instances that differ significantly from the majority. Those instances are known as outliers. There are various incentives to detect them depending on the context and domain of each application. There are also semi-supervised and fully supervised methods for anomaly detection, but we’ll focus on the unsupervised approach. Local outlier factor is one of the main anomaly detection models defined in the following equation.
- is the local outlier factor of the dataset instance .
- is the local reachability density of .
- is the number of nearest neighbors to .
- is the set of nearest neighbors.
The local outlier factor of an instance is the average local reachability density of its neighbors divided by the local reachability of the instance itself. Values that are significantly larger than 1 indicate that the instance is an outlier, while smaller values suggest an inlier.
Anomaly detection using PyCaret
As we mentioned earlier, the local outlier factor is a popular anomaly detection model, but numerous others are available as well: Isolation forest, k-nearest neighbors detector, subspace outlier detection, and clustering-based local outlier. In the rest of this chapter, we’ll see how we can train and plot an anomaly detection model using the PyCaret library.
Importing the necessary libraries
We’ll import the libraries necessary for this project: pandas, Matplotlib, Seaborn, and the PyCaret Anomaly Detection module. We’ll also import the get_data()
function to load the dataset of our preference. Finally, we’ll set the Matplotlib figure DPI to 300 to get high-quality images for this course.
Get hands-on with 1400+ tech skills courses.