...

/

Approach to the Problem

Approach to the Problem

Learn how to solve a knowledge graph completion problem.

Our approach

After the dataset is built, we can use the pykeen Python library to train a knowledge graph embedding model and evaluate its performance. Let's take a look at the code below:

Press + to interact
from pykeen.triples import TriplesFactory
from pykeen.pipeline import pipeline
# create triples factory from the dataset
tf = TriplesFactory.from_labeled_triples(df.values)
# split into training and testing
training, testing = tf.split([0.8, 0.2], random_state=42)
# train using pipeline method
result = pipeline(
training=training,
testing=testing,
model = "TransR",
model_kwargs=dict(embedding_dim=128),
optimizer = "adam",
training_kwargs=dict(num_epochs=20, use_tqdm_batch=False),
random_seed=42,
device='cpu',
negative_sampler = 'bernoulli',
negative_sampler_kwargs = dict(num_negs_per_pos = 3))
# retrieve results
result_df = result.metric_results.to_df()
print(result_df)

Note: At the end of the output, that is not the errors; it measures the time the evaluation takes.

Let’s look at the code explanation below:

  • Lines 5–7: Create TriplesFactory from the dataset and split it into training and testing sets in a ratio of 80:20.

  • Lines 10–20: Train knowledge graph embeddings using the TransR algorithm and PyKEEN's pipeline method. This method performs the evaluation of embeddings and outputs the results.

  • Line 23: Saves the results to a DataFrame.

  • Line 25: Prints the DataFrame.

The resulting DataFrame shows different metrics and their values.

Evaluation

PyKEEN's pipeline method performs evaluations and provides results in an easy-to-read manner. The results in the DataFrame can look as follows (values might be different since the model was trained with different hyperparameters):

Press + to interact
PyKEEN pipeline result metrics
PyKEEN pipeline result metrics

Let's find out what these metrics mean.

Process

By default, PyKEEN uses a rank-based evaluation, which is standard for link prediction tasks. After generating the knowledge graph embeddings using the TransR algorithm, the pipeline method performs rank-based evaluation on the link prediction task.

Let's break this down into different steps.

  1. We split the triples into training and testing sets so that both sets have common ...