Approach to the Problem
Learn how to solve a knowledge graph completion problem.
We'll cover the following...
Our approach
After the dataset is built, we can use the pykeen
Python library to train a knowledge graph embedding model and evaluate its performance. Let's take a look at the code below:
from pykeen.triples import TriplesFactoryfrom pykeen.pipeline import pipeline# create triples factory from the datasettf = TriplesFactory.from_labeled_triples(df.values)# split into training and testingtraining, testing = tf.split([0.8, 0.2], random_state=42)# train using pipeline methodresult = pipeline(training=training,testing=testing,model = "TransR",model_kwargs=dict(embedding_dim=128),optimizer = "adam",training_kwargs=dict(num_epochs=20, use_tqdm_batch=False),random_seed=42,device='cpu',negative_sampler = 'bernoulli',negative_sampler_kwargs = dict(num_negs_per_pos = 3))# retrieve resultsresult_df = result.metric_results.to_df()print(result_df)
Note: At the end of the output, that is not the errors; it measures the time the evaluation takes.
Let’s look at the code explanation below:
Lines 5–7: Create
TriplesFactory
from the dataset and split it into training and testing sets in a ratio of 80:20.Lines 10–20: Train knowledge graph embeddings using the
TransR
algorithm and PyKEEN'spipeline
method. This method performs the evaluation of embeddings and outputs the results.Line 23: Saves the results to a DataFrame.
Line 25: Prints the DataFrame.
The resulting DataFrame shows different metrics and their values.
Evaluation
PyKEEN's pipeline method performs evaluations and provides results in an easy-to-read manner. The results in the DataFrame can look as follows (values might be different since the model was trained with different hyperparameters):
Let's find out what these metrics mean.
Process
By default, PyKEEN uses a rank-based evaluation, which is standard for link prediction tasks. After generating the knowledge graph embeddings using the TransR
algorithm, the pipeline method performs rank-based evaluation on the link prediction task.
Let's break this down into different steps.
We split the triples into training and testing sets so that both sets have common ...