...

/

Graph Clustering and Community Detection (Unsupervised)

Graph Clustering and Community Detection (Unsupervised)

Learn about different types of graph clustering techniques.

Graph clustering is one of the essential unsupervised tasks. This is a type of graph mining task, and it's helpful in many applications, such as marketing and congestion detection.

K-means clustering

We must first learn the graph embeddings if we use K-means clustering on a graph dataset. Like the previous sections, we can explore DeepWalk to learn the embeddings and then apply K-means clustering.

Let's try to tune the hyperparameters of DeepWalk this time and compute the best performance of graph embeddings on the data of Zachary’s Karate Club.

Press + to interact
from karateclub import DeepWalk
import networkx as nx
from sklearn.cluster import KMeans
from sklearn.metrics.cluster import normalized_mutual_info_score
import numpy as np
import warnings
warnings.filterwarnings("ignore")
G = nx.karate_club_graph()
y = np.asarray([G.nodes[i]['club'] != 'Mr. Hi' for i in G.nodes]).astype(np.int64)
# train model and generate embedding
model = DeepWalk(walk_length=10, dimensions=128, window_size=10,seed=42)
model.fit(G)
X = model.get_embedding()
# tune hyperparameters of deepwalk to increase NMI score
def tune():
best_score = 0
best_params = {}
for walk_length in [10, 20, 30, 40]:
for dimensions in [16, 32, 64, 128]:
for window_size in [5, 10, 15, 20]:
model = DeepWalk(walk_length=walk_length, dimensions=dimensions, window_size=window_size,seed=42)
model.fit(G)
X = model.get_embedding()
kmeans = KMeans(n_clusters=2, random_state=42).fit(X)
y_hat = kmeans.labels_
score = normalized_mutual_info_score(y, y_hat)
if score > best_score:
best_score = score
best_params = {"walk_length": walk_length, "dimensions": dimensions, "window_size": window_size}
return best_score, best_params
best_score, best_params = tune()
print("Best NMI Score: {:.2f}".format(best_score))
print("Best Parameters: {}".format(best_params))

Let’s look at the code explanation below:

  • Line 9: Instantiates karate_club_graph.

  • Line 10: Stores label data.

  • Lines 13–14: Train the DeepWalk model to compute the graph embeddings. ...