Named Entity Recognition with RNNs: Training and Evaluation

Learn about the evaluation metrics and the loss function after training the model.

Evaluation metrics and the loss function

During our previous discussion, we alluded to the fact that NER tasks carry a high class imbalance. It’s quite normal for text to have more nonentity-related tokens than entity-related tokens. This leads to large amounts of other (0) type labels and fewer of the remaining types. We need to take this into consideration when training the model and evaluating the model. We’ll address the class imbalance in two ways:

  • We’ll create a new evaluation metric that is resilient to class imbalance.

  • We’ll use sample weights to penalize more frequent classes and boost the importance of rare classes.

In this lesson, we’ll only address the former. The latter will be addressed in the next lesson. We’ll define a modified version of the accuracy. This is called a macro-averaged accuracy. In macro averaging, we compute accuracies for each class separately and then average it. Therefore, the class imbalance is ignored when computing the accuracy. When computing standard metrics like accuracy, precision, or recall, there are different types of averaging available.

Different types of metric averaging

There are different types of averaging available for metrics. We can read one such example of this averaging available in scikit-learn. Consider a simple binary classification example with the following confusion matrix results:

Get hands-on with 1400+ tech skills courses.