Grokking the Machine Learning Interview/

...

Modeling

Let's look at the modeling options for entity linking.

We'll cover the following...

Contextualized text representation
- ELMo
- BERT
NER modelling
- Contextual embedding as features
- Fine-tuning embeddings
Disambiguation modeling
- Candidate generation
- Linking

The first step of entity linking is to build a representation of the terms that you can use in the ML models. It’s also critical to use contextual information (i.e., other terms in the sentence) as you embed the terms. Let’s see why such representation is necessary.

Contextualized text representation

It is often observed that the same words may refer to a different entity. The context (i.e., other terms in the sentence) in which the words occur helps us figure out which entity is being referred to. Similarly, the NER and NED models require context to correctly recognize entity type and disambiguate, respectively. Therefore, the representation of terms must take contextual information into account.

One way to represent text is in the form of embeddings. For instance, let’s say you have the following sentences:

When you generate the embedding for the words “Michael Jordan”, through traditional methods such as Word2vec, the embedding would be the same in both sentences. However, you can see that, in the first sentence, “Michael Jordan” is referring to the UC Berkeley professor. Whereas, in the second sentence, it is referring to the basketball player. So, the embedding model needs to consider the whole sentence/context while generating an embedding for a word to ensure that its true meaning is captured.

Notice that, in the first sentence, the context that helps to identify the person comes after the mention. Whereas, in the second sentence, the helpful context comes before the mention. Therefore, the embedding model needs to be bi-directional, i.e., it should look at the context in both the backward direction and the forward direction.

Two popular model architectures that generate term contextual embeddings are:

ELMo
BERT

ELMo

Let’s see how ELMo (Embeddings from Language Models) generates contextual embeddings. It starts with a character-level convolutional neural network (CNN) or context-independent word embedding model (e.g., Word2vec) to represent words of a text string as raw word vectors. The raw vectors are fed to a bi-directional LSTM layer trained on a language modeling (generating the next word in a sentence conditioned on previous words) objective. This layer has a forward pass and a backward pass.

The forward pass sequentially goes over the input sentence from left to right. It looks at the context (words) before the active word. Whereas, the backward pass sequentially goes over the input sentence from right to left. It looks at the context (words) after the active word to predict the current word. Contextual information from both these passes is concatenated and then combined in another layer to obtain contextual embeddings of the text.

Press + to interact

Introduction

Practical ML Techniques/Concepts

Search Ranking

Feed Based System

Recommendation System

Self-Driving Car: Image Segmentation

Entity Linking System

Ad Prediction System

Modeling

Contextualized text representation

ELMo