Markov Clustering

Understand how random walks can be used to cluster graphs and how this helps entity resolution.

Transitive clustering and MCC are popular in the community due to their simplicity and straightforward interpretation. However, they tend to underperform in scenarios with increasing cluster sizes. This, on the other hand, is the sweet spot of Markov clustering.

Resolving geographic settlements

We use the open geographic settlements dataset, where almost all clusters have a size of four. Below, we read the original data provided in JSON format and reshape the actual cluster assignments into a cross-reference table:

Get hands-on with 1200+ tech skills courses.