Search⌘ K

DBSCAN

Learn about DBSCAN clustering, density, dense regions, point types, and algorithm.

After exploring K-means, a form of partitional clustering that relies on distance to a central point, we now move to density-based clustering. This approach groups together data points that are closely packed (high density) while marking points that lie alone in low-density regions as outliers. The most popular algorithm in this category is DBSCAN (Density-Based Spatial Clustering of Applications with Noise). A key advantage of DBSCAN is that it does not require prior knowledge of the number of clusters and is robust against noise.

DBSCAN clustering

In density-based clustering, the dataset is partitioned into dense regions separated by areas of low density. Density is quantified using two crucial hyperparameters:

  • Epsilon (ϵ\epsilon): Specifies the maximum distance between two points to be considered neighbors. If the distance between two points is ϵ\le \epsilon, they are considered to be in each other’s neighborhood.
  • MinPoints (mm): Specifies the minimum number of neighbors a point must have (including itself) within the ϵ\epsilon distance to be considered a core point.

Density at a point

The density at any data point x\mathbf{x} is defined as the number of data points in the dataset DD within a circle of radius ϵ\epsilon centered at x\mathbf{x}.

The image below illustrates the concept of density by showing the points enclosed within a circle of a specified radius centered at point CC. ...