DBSCAN Clustering and Customer Segmentation

Density-based clustering is also one of the most widely used clustering algorithms which helps in detecting the outliers as in data. In this lesson, you can discover more about it.

DBSCAN clustering

The acronym DBSCAN stands for Density Based spatial clustering of Applications with Noise. It works on the analogy that clusters are the areas of high density separated by the areas of low density. Due to its property of considering clusters as areas of high density separated from areas of low density, it can deal with clusters of any shape unlike K-means clustering which assumes clusters are spherical, equally dense, and not contaminated by outliers.

It marks points as outliers or noise that lie alone in low-density regions (whose nearest neighbors are too far away). It also makes the assumption that there is noise in the dataset. Clusters in density-based clustering satisfy the following properties:

  1. All points in a cluster are mutually-density connected.

  2. If a point is density reachable from some point of the cluster, it is also the part of the cluster.

Working of DBSCAN clustering

DBSCAN works in the following way.

  • It starts by identifying core samples or points in the dataset. A Core sample or point is the one that has at least min_samples or MinPts points around it within a distance of eps ϵ\epsilon.

  • Once we identify a core sample, we then examine its neighbors and add them to the cluster if they meet the core sample criteria.

  • Then, the cluster is expanded so that we can add non-core samples to it. These samples can be reached directly from the core samples within a distance of eps ϵ. However, they are not core samples themselves. These points are also called border points in some literature.

  • Once we have identified all the clusters, along with their core and non-core samples, the remaining samples are considered noise or outliers.

Get hands-on with 1400+ tech skills courses.