In k-means, the boundaries between clusters are hard or crisp (that is, a data point either belongs to a cluster or not). Fuzzy c-means, also known as fuzzy k-means, extend k-means by fuzzifying the boundaries, that is, allowing data points to simultaneously belong to many clusters with varying degrees of membership. As such, a data point, for example, may have 40% membership with Cluster 1 and 60% membership with Cluster 2. Such clusters are called soft or fuzzy, as there are no concrete boundary lines between them.

Assumptions and target cluster type

The key extension of fuzzy c-means over k-means is that it utilizes fuzzy clusters instead of crisp ones. Its assumptions and implications are the same as those of k-means (that is, it still looks for spherical-shaped clusters).

Method

We will adopt the description of this algorithm from Yen and Langari (1999) on Fuzzy logic. Similar to k-means, the inputs of FCM are:

  • The input dataset, DD
  • The distance function d(x,y)d(x, y) with x and y being data points in D (optional).
  • The number of clusters, KK
  • A convergence threshold εε.
  • The fuzziness parameter m>1m > 1, which controls how fuzzy the clusters will be. The higher mm is, the fuzzier the clusters (more details follow).

Usually, dd is set to be the Euclidean distance function.

FCM follows the following steps mentioned below.

Step 1: Initialization

Select K initial centroids, represented as mean vectors c1,c2,...,cKc_1 , c_2 , . . . , c_K . These centroids can be selected randomly or with prior knowledge.

Step 2: Cluster membership computation

For all clusters cjc_j (j=1K)( j = 1 \dots K), the memberships μcj(x)μ_{c_j} (x) of each data point xDx ∈ D are computed as follows:

μcj(x)=1i=1K(d(x,cj)d(x,ci))1m1μ_{c_j}(x)=\frac {1} {\sum^K_{i=1} \bigg(\frac {d(x,c_j)} {d(x,c_i)}\bigg)^\frac{1}{m-1} }

where

  • jj is between 11 and KK
  • d(x,ci)d(x, c_i) and d(x,cj)d(x, c_j) denote the distance between xx and clusters ci,cjc_i, c_j

Note here that mm is the exponent fuzzy membership and is a parameter in the objective function, and m>1m > 1 denotes the fuzziness of the cluster. As mm approaches 11, the membership values degrade to either 00 or 11 (i.e., the membership to the closest cluster will approach 11 and others will approach 00).

In the above equation μ(x)μ (x) is proportional to 1d(x,cj)11m\frac{1}{d(x,c_j)^{\frac {1}{1-m}}} ,with

i=1Kd(x,ci)1m1{\sum^K_{i=1} {d(x,c_i)}^\frac{1}{m-1} }

acting as a normalization term.

When mm goes to infinity, the exponent approaches 00. Thus, the term d(x,cj)11md(x,c_j)^ {\frac {1}{1-m}} approaches 11. This means it doesn’t matter which cluster is closer to xx—all memberships will be almost the same. This is what extreme fuzziness looks like: every point has similar membership to every cluster, regardless of their proximity. On the other hand, as mm goes to 11, the term grows exponentially with the inverse of the distance toward infinity. Through normalization, the distance to the closest cluster will greatly outweigh the rest, thus making for very crisp boundaries.

Step 3: Centroid update

Cluster centroids are recomputed as weighted means of their members, taking into account the membership ww:

cj=xD(μCj(x))mxxD(μCj(x))mc_j=\frac{\sum_{x∈D}(μ_{C_j} (x))^mx}{{\sum_{x∈D}(μ_{C_j} (x))^m}}

where

  • xx are points in dataset DD,
  • cjc_j is the centroid of cluster j, and
  • $μ_{c_j} (x) is the membership degree of xx in cluster jj.

Step 4: Cluster membership re-computation

The cluster membership values of each data point in D are recomputed after the centroid update using the cluster membership equation.

Step 5: Stability checking and termination

T=i=1CcjPreviousciεT=\sum_{i=1}^C |c_j^{Previous} -c_i| ≤ ε

This equation checks whether the difference between the centroid of a cluster over all clusters doesn’t change more than a specific threshold εε set as an input. If it’s below that threshold, then it’s deemed converged and the algorithm terminates. Otherwise, go back to Step 3.

Hyperparameter tuning

The key parameter to tune for fuzzy c-means is the number of clusters KK. While the fuzziness parameter m could be a target for tuning, it usually suffices to fix m=2m = 2, unless there’s some domain knowledge to set it otherwise. The distance metric dd is often set to Euclidean distance.

A practical example of applying FCM

In the lab Fuzzy c-means of the chapter, we show how to use FCM to cluster the player data in the DOTAlicous dataset. The usage of FCM here is almost identical to k-means, except with a different function: fanny instead of k-means. In first, we also clean and normalize the data. Then, we will call the fanny method. Please follow the second lab to understand the practical aspect of applying the appropriate methods.

After applying fuzzy c-means, we can inspect the memberships by printing the membership degrees for each centroid. The following table shows the result, where each row corresponds to one data point labeled by its unique row name (B, B.1, N, B.2, etc.), and each column indicates the membership to the corresponding centroids (that is, centroid 1, 2, 3, or 4). The membership values on each row should sum up to 1.

Get hands-on with 1400+ tech skills courses.