Graph Clustering and Community Detection (Unsupervised)
Learn about different types of graph clustering techniques.
We'll cover the following...
Graph clustering is one of the essential unsupervised tasks. This is a type of graph mining task, and it's helpful in many applications, such as marketing and congestion detection.
K-means clustering
We must first learn the graph embeddings if we use K-means clustering on a graph dataset. Like the previous sections, we can explore DeepWalk
to learn the embeddings and then apply K-means clustering.
Let's try to tune the hyperparameters of DeepWalk
this time and compute the best performance of graph embeddings on the data of Zachary’s Karate Club.
from karateclub import DeepWalkimport networkx as nxfrom sklearn.cluster import KMeansfrom sklearn.metrics.cluster import normalized_mutual_info_scoreimport numpy as npimport warningswarnings.filterwarnings("ignore")G = nx.karate_club_graph()y = np.asarray([G.nodes[i]['club'] != 'Mr. Hi' for i in G.nodes]).astype(np.int64)# train model and generate embeddingmodel = DeepWalk(walk_length=10, dimensions=128, window_size=10,seed=42)model.fit(G)X = model.get_embedding()# tune hyperparameters of deepwalk to increase NMI scoredef tune():best_score = 0best_params = {}for walk_length in [10, 20, 30, 40]:for dimensions in [16, 32, 64, 128]:for window_size in [5, 10, 15, 20]:model = DeepWalk(walk_length=walk_length, dimensions=dimensions, window_size=window_size,seed=42)model.fit(G)X = model.get_embedding()kmeans = KMeans(n_clusters=2, random_state=42).fit(X)y_hat = kmeans.labels_score = normalized_mutual_info_score(y, y_hat)if score > best_score:best_score = scorebest_params = {"walk_length": walk_length, "dimensions": dimensions, "window_size": window_size}return best_score, best_paramsbest_score, best_params = tune()print("Best NMI Score: {:.2f}".format(best_score))print("Best Parameters: {}".format(best_params))
Let’s look at the code explanation below:
Line 9: Instantiates
karate_club_graph
.Line 10: Stores label data.
Lines 13–14: Train the
DeepWalk
model to compute the graph embeddings. ...