Using Similarity for Link Prediction
Learn how to use similarity for link prediction.
What’s link prediction?
Link prediction is a type of inference task we can make in complex networks. It starts with the assumption that the network is going to grow over time. It also considers that there’s a logic behind the new edges that are going to be created in the future, meaning that they aren’t completely random.
The task is to estimate which edges will be created and which not. One way of doing such type of inference is by using similarity measures.
Validating link prediction methods
When doing inference, it’s required that we know how to evaluate the method we are applying so we can find out which method is best. With link prediction tasks, two usual evaluation metrics are precision and recall.
Our objective is to output a set of edges that do not exist in the current state of the network but we believe will appear in the future.
For each prediction our method makes, it can either be right or wrong. Also, it can be wrong on the other side: it can say that an edge won’t be created when, in fact, it will.
These possibilities generate what we call a confusion matrix:
Let’s say 1 is a correct prediction and 0 is a wrong prediction. In this case, there are four possibilities:
True positive (TP): Cases in which our method stated that an edge would be created and it was correct.
False positive (FP): Cases in which our method stated that an edge would be created, but the edge was not created, and therefore, it got it wrong.
False negative (FN): Cases in which our method stated that the edge would not be created, but it was actually created.
True negative (TN): Cases in which our model stated that the edge would not be created and it was correct.
With that in mind, we can define two prediction quality metrics. The first one is called precision:
In a more intuitive way, precision measures that from everything we said is going to be created, how much did we get right?
The other metric is called recall and is defined as: