Modeling

Let's look at the modeling options for entity linking.

The first step of entity linking is to build a representation of the terms that you can use in the ML models. It’s also critical to use contextual information (i.e., other terms in the sentence) as you embed the terms. Let’s see why such representation is necessary.

Contextualized text representation

It is often observed that the same words may refer to a different entity. The context (i.e., other terms in the sentence) in which the words occur helps us figure out which entity is being referred to. Similarly, the NER and NED models require context to correctly recognize entity type and disambiguate, respectively. Therefore, the representation of terms must take contextual information into account.

One way to represent text is in the form of embeddings. For instance, let’s say you have the following sentences:

When you generate the embedding for the words “Michael Jordan”, through traditional methods such as Word2vec, the embedding would be the same in both sentences. However, you can see that, in the first sentence, “Michael Jordan” is referring to the UC Berkeley professor. Whereas, in the second sentence, it is referring to the basketball player. So, the embedding model needs to consider the whole sentence/context while generating an embedding for a word to ensure that its true meaning is captured.

Notice that, in the first sentence, the context that helps to identify the person comes after the mention. Whereas, in the second sentence, the helpful context comes before the mention. Therefore, the embedding model needs to be bi-directional, i.e., it should look at the context in both the backward direction and the forward direction.

Two popular model architectures that generate term contextual embeddings are:

  1. ELMo
  2. BERT

ELMo

Let’s see how ELMo (Embeddings from Language Models) generates contextual embeddings. It starts with a character-level convolutional neural network (CNN) or context-independent word embedding model (e.g., Word2vec) to represent words of a text string as raw word vectors. The raw vectors are fed to a bi-directional LSTM layer trained on a language modeling (generating the next word in a sentence conditioned on previous words) objective. This layer has a forward pass and a backward pass.

The forward pass sequentially goes over the input sentence from left to right. It looks at the context (words) before the active word. Whereas, the backward pass sequentially goes over the input sentence from right to left. It looks at the context (words) after the active word to predict the current word. Contextual information from both these passes is concatenated and then combined in another layer to obtain contextual embeddings of the text.

Press + to interact
Elmo (unravelled)
Elmo (unravelled)

📝 If the raw word vector is made using a character level CNN, the inner structure of the word is captured. For instance, the similarity in the structure of the words “learn” and “learning” will be captured, which will be helpful information for the bi-directional LSTM layer.

The character level CNN will also make good raw vectors for out-of-vocabulary words by looking at their similarity with the vocabulary observed during the training phase. ...

Access this course and 1400+ top-rated courses and projects.