Challenges in Applying the Clustering Process

Learn the challenges in the clustering process.

When applying the clusters, users usually face some key challenges.

Cluster definition

First, how do we define a suitable cluster given a dataset? This is problematic because, at this stage, we’re exploring the data, and thus, we may not necessarily know what a cluster looks like or how the points can be grouped together. When we discuss the clustering algorithms below, we’ll see that each algorithm imposes certain assumptions on the kinds of clusters it is looking for. For example, k-means is optimized to search for clusters that are in the form of convex, blob-like shapes. Accepting such assumptions when applying a specific clustering algorithm means that we may not find all clusters existing in the data, or worse, clusters discovered may be wrong.

Specifically, clusters may take vastly different forms in real life, as illustrated in the below figure. Besides being blob-like, round, or elliptical, clusters may also be elongated entities that encompass one another. Note that there’s no single clustering algorithm able to detect all these kinds of clusters. Therefore, knowing beforehand the types of clusters we are looking for is critical to the success of our analyses. To overcome this issue, we can apply different clustering algorithms to the same dataset to ensure that we are not missing out on any particular type of cluster.

Get hands-on with 1400+ tech skills courses.