SpanBERT

Learn about the SpanBERT variant of BERT and its architecture.

SpanBERT is another interesting variant of BERT. As the name suggests, SpanBERT is mostly used for tasks like question-answering, where we predict the span of text. Let's understand how SpanBERT works by looking into its architecture.

Understanding the architecture of SpanBERT

Let's understand SpanBERT with an example. Consider the following sentence:

Tokenizing the sentence

After tokenizing the sentence, we will have the tokens as follows:

tokens = [ you, are, expected, to, know, the, laws, of, your, country]

Masking the tokens

Instead of masking the tokens randomly, in SpanBERT, we mask the random contiguous span of tokens as shown:

tokens = [ you, are, expected, to, know, [MASK], [MASK], [MASK], [MASK], country]

We can observe that instead of masking the tokens at random positions, we have masked the random contiguous span of tokens. Now, we feed the tokens to SpanBERT and get the representation of the tokens. As shown in the following figure, we mask the random contiguous span of tokens and feed them to the SpanBERT model, which returns the representation, RiR_i, of each token, ii:

Press + to interact
SpanBERT
SpanBERT

Training SpanBERT with the MLM and SBO

In order to predict the masked token, we train the SpanBERT model with the MLM objective along with a new objective called the span boundary objective (SBO). Let's explore how this works in detail.

Masked language modeling (MLM)

We know that in the MLM objective, to predict the masked token, our model uses the corresponding representation of the masked token. Suppose we need to predict the masked token x7x_7; so, with the representation R7R_7, we can predict the masked token. We just feed the representation R7R_7 ...