What is the k-medoids algorithm?

K-medoids

K-medoids, also known as partitioning around medoids (PAM), is a popular clustering algorithm that groups $k$ data points into clusters by selecting $k$ representative objects within a dataset. Clustering is a robust unsupervised machine-learning algorithm that establishes patterns by identifying clusters or groups of data points with similar characteristics within a specific dataset.

The k-means clustering algorithm uses centroids. K-medoids is an alternative clustering algorithm that uses medoids instead.

Medoids

A medoids can be defined as a point in the cluster within a dataset from which the sum of distances to other points is minimal. It is the data point in a cluster characterized by the lowest dissimilarity with other data points.

The k-means algorithm is sensitive to outliers. The k-medoids algorithm, on the other hand, mitigates that sensitivity by eliminating reliance on centroids. The k-medoids algorithm aims to group data points into $k$ clusters, where each data point is assigned to a medoid, and the sum of distances between data points and their assigned medoid is minimized. The algorithm iteratively assigns each data point to the closest medoid and swaps the medoid of each cluster until convergence.

How the k-medoids algorithm works

Select $k$ random points from the dataset. First, the algorithm selects $k$ random points from the dataset as the initial medoids. The medoids that are chosen are used to define the initial $k$ clusters.
Assign data points to the cluster of the nearest medoid. It then assigns each non-medoid to the cluster that has the closest medoid.
Calculate the total sum of distances of data points from their assigned medoids for each medoid. It then calculates the cost, i.e., the total sum of distances or dissimilarities of the data points from the assigned medoid.
Swap a non-medoid point with a medoid point and recalculate the cost. It then swaps every non-medoid point with the medoid assigned to it and recalculates the total sum of distances.
Undo the swap if the recalculated cost with the new medoid exceeds the previous cost. Finally, it evaluates whether the recalculated cost is more than the previously calculated cost. If this is the case, it undoes the swap, and the algorithm converges. If the recalculated cost is less, it repeats step 4.

After the algorithm completes, we will have $k$ medoid points with their clusters.

Let's look at the following example demonstrating the k-medoid algorithm.

Example

Let's take the data given below. We want to divide the data into clusters with $k=2$ .

The final clusters will be output:

Cluster 0:
$[[6, 9], [8, 5], [8, 3], [6, 4], [3, 9]]$

Cluster 1:
$[[4, 10], [4, 4], [5, 2], [5, 8], [4, 8]]$

The final medoids will be $(6, 9)$ and $(4, 4)$ .

Advantages

The k-medoids algorithm is quick and converges after a fixed amount of steps. It's simple and easy to implement and understand. It also addresses the limitations that come with the k-means algorithm.

Disadvantages

The k-medoids algorithm isn't optimal nor suitable for clustering where non-spherical groups or objects are concerned. The reason is that it hugely depends on minimizing the distance between the non-medoid and medoid points. Another disadvantage is that since the first $k$ medoids are chosen randomly, the outcome may differ on different runs on the same dataset

Conclusion

K-means is one of the most powerful clustering algorithms. However, it has limitations, including sensitivity towards outliers that could impact the formed clusters. On the other hand, k-medoids is a powerful clustering tool with several benefits over other clustering algorithms. Its ability to group data based on their similarity proves to be a valuable asset for pattern recognition and data analysis.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

No	X	Y	Distance from medoid m1 (4,6)	Distance from medoid m2 (9,3)
1	9	3	-	-
2	8	4	6	2
3	4	6	-	-
4	8	5	5	3
5	2	5	3	9
6	3	8	3	11
7	5	8	3	9
8	4	4	2	6
9	10	4	8	2
10	9	6	5	3

No	X	Y	Distance from medoid m1 (2,5)	Distance from medoid m2 (9,3)
1	9	3	-	-
2	8	4	7	2
3	4	6	3	8
4	8	5	6	3
5	2	5	-	-
6	3	8	4	11
7	5	8	6	9
8	4	4	3	6
9	10	4	9	2
10	9	6	8	3

What is the k-medoids algorithm?

K-medoids

Medoids

Manhattan distance

How the k-medoids algorithm works

Example

Data points

Distance from randomly selected medoids

Distance from new medoid

Python implementation

Advantages

Disadvantages

Conclusion

No.	X	Y
1	9	3
2	8	4
3	4	6
4	8	5
5	2	5
6	3	8
7	5	8
8	4	4
9	10	4
10	9	6