Metrics
Let's go over the metrics to evaluate the performance of the entity linking system.
In the previous lesson, we talked about various applications of named entity linking system and how it can be used as a component in bigger tasks/systems such as a virtual assistant system. Therefore, we require metrics that:
-
Compare different entity linking models based on their performance.
This will be catered to by offline metrics.
-
Measure the performance of the bigger task when a particular model for entity linking is used.
This will be catered to by online metrics.
📝 Offline metrics will be aimed at improving/measuring the performance of the entity linking component. Online metrics will be aimed at improving/measuring the performance of the larger system by using a certain entity linking model as its component.
Offline metrics
The named-entity linking component is made of two layers, as discussed previously:
- Named entity recognition
- Disambiguation
We will first look at offline metrics for each of the layers individually and will then discuss a good offline metric to measure the overall entity linking system.
Named entity recognition
For the first layer/component, i.e., the recognition layer, you want to extract all the entity mentions from a given sentence. We will continue with the previous sentence example, i.e., “Michael Jordan is the best professor at UC Berkeley”.
It has two entity mentions:
- Michael Jordan
- UC Berkeley
NER should be able to detect both entities correctly. However, it may detect:
- Both correctly
- One correctly
- None correctly (wrongly detect non-entity as an entity)
- Correct entity but with the wrong type
- No entity, i.e., altogether miss the entities in the sentence
📝 You will call a recognition/detection of a named entity correct, only if it is an exact match of the entity in the labeled data. If NER only recognizes “Michael” as an entity and misses the “Jordan” part, it would be considered wrong. Moreover, if NER recognizes “Michael Jordan” as an entity but with the wrong type (say Organization), again, it would be considered wrong.
Given the above context on the correctness of the system, both precision and recall are important for measuring the performance of NER. They will be defined as:
Precision =
Recall = ...