DBSCAN
Learn about DBSCAN clustering, density, dense regions, point types, and algorithm.
After exploring K-means, a form of partitional clustering that relies on distance to a central point, we now move to density-based clustering. This approach groups together data points that are closely packed (high density) while marking points that lie alone in low-density regions as outliers. The most popular algorithm in this category is DBSCAN (Density-Based Spatial Clustering of Applications with Noise). A key advantage of DBSCAN is that it does not require prior knowledge of the number of clusters and is robust against noise.
DBSCAN clustering
In density-based clustering, the dataset is partitioned into dense regions separated by areas of low density. Density is quantified using two crucial hyperparameters:
- Epsilon (): Specifies the maximum distance between two points to be considered neighbors. If the distance between two points is , they are considered to be in each other’s neighborhood.
- MinPoints (): Specifies the minimum number of neighbors a point must have (including itself) within the distance to be considered a core point.
Density at a point
The density at any data point is defined as the number of data points in the dataset within a circle of radius centered at .
The image below illustrates the concept of density by showing the points enclosed within a circle of a specified radius centered at point . ...