Coding the Solution
Understand the entire process of coding in the knowledge graph completion task.
The solution to the problem
Our aim is to generate friend recommendations for different people in the social network graph. We can formulate this as a link prediction or a knowledge graph completion task. Specifically, this is a task of head prediction or tail prediction.
Let's have a look at the following code and run it to generate the results:
Press + to interact
import networkx as nxfrom faker import Fakerimport randomimport pandas as pdfrom pykeen.triples import TriplesFactoryfrom pykeen.pipeline import pipelinerandom.seed(2023)# Create a synthetic graph with 500 nodes, 4 average degree, and 0.5 rewiring probabilityG = nx.watts_strogatz_graph(n=500, k=4, p=0.5, seed=42)# node namesfaker = Faker()Faker.seed(0)node_names = []# for loop to generate 500 unique namesfor i in range(501):name = faker.name()if name not in node_names:node_names.append(name)# relabel node names of the graph Gmapping = {i: node_names[i] for i in range(len(node_names))}G = nx.relabel_nodes(G, mapping)# remove any possible selfloopsG.remove_edges_from(nx.selfloop_edges(G))#edge attribute namesattributes = ['friend', 'family', 'acquaintance','colleague', 'classmate', 'neighbor','schoolmate']# probability distribution for each attributeprob_dist = {'friend': 0.3, 'family': 0.05, 'acquaintance': 0.2,'colleague': 0.05, 'university': 0.15, 'neighbor': 0.15,'school': 0.1}# assign edge attributesfor edge in G.edges:G[edge[0]][edge[1]]['relation'] = random.choices(attributes,weights = list(prob_dist.values()),k=1)[0]# graph to triple dataframetriples = []for edge in G.edges:triples.append([edge[0], G[edge[0]][edge[1]]['relation'], edge[1]])# create dataframedf = pd.DataFrame(triples)df.columns = ['h', 'r', 't']# create triples using pykeentf = TriplesFactory.from_labeled_triples(df.values)training,testing = tf.split([0.8,0.2], random_state=42)# train the TransR model using PyKEENresult = pipeline(training=training,testing=testing,model = "TransR",model_kwargs=dict(embedding_dim=128),optimizer = "adamw",training_kwargs=dict(num_epochs=50, use_tqdm_batch=False),random_seed=42,device='cpu',negative_sampler = 'bernoulli',negative_sampler_kwargs = dict(num_negs_per_pos = 5))# store rank based metrics in data framedf_metrics = result.metric_results.to_df()df_metrics.to_csv('output/df_metrics.csv')
Let’s look at the code explanation below:
Line 10: Creates a synthetic graph using the Watts-Strogatz model with
500
nodes.Lines 13–20: Generate random unique names to relabel the nodes in the social network.
Lines 23–27: Relabel the nodes and remove self-loops. ...