DBSCAN
Learn about DBSCAN clustering, density, dense regions, point types, and algorithm.
After learning the famous -means clustering algorithm, a type of partitional clustering, we’ll move towards a more robust approach still used in the industry. This approach is called density-based clustering.
DBSCAN clustering
DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. In density-based clustering, the set of data points is partitioned into dense regions separated by regions of low density. DBSCAN is one popular density-based clustering algorithm. It’s a robust algorithm that doesn’t require prior knowledge of the number of clusters in the data, unlike the -means algorithm.
How to define density
Density is defined by two parameters, epsilon () and minPoints (), which quantify density for individual points and a set of points.
Epsilon specifies the maximum distance between two points to be considered neighbors. If the distance between two points is less than or equal to , they’re considered to be in each other’s neighborhood. MinPoints specifies the minimum number of neighbors a point must have within the distance to be considered a core point. Any point with fewer than neighbors within the distance is considered a border or noise point.
Density at a point
The density at any data point is defined as the number of data points in within a circle of radius centered at .
The code below computes the density of a point x
given the array of points D
and the radius eps
using Euclidean distance as dissimilarity. Density at a point with a radius of is shown below:
Get hands-on with 1400+ tech skills courses.