SpanBERT
Learn about the SpanBERT variant of BERT and its architecture.
We'll cover the following...
SpanBERT is another interesting variant of BERT. As the name suggests, SpanBERT is mostly used for tasks like question-answering, where we predict the span of text. Let's understand how SpanBERT works by looking into its architecture.
Understanding the architecture of SpanBERT
Let's understand SpanBERT with an example. Consider the following sentence:
Tokenizing the sentence
After tokenizing the sentence, we will have the tokens as follows:
tokens = [ you, are, expected, to, know, the, laws, of, your, country]
Masking the tokens
Instead of masking the tokens randomly, in SpanBERT, we mask the random contiguous span of tokens as shown:
tokens = [ you, are, expected, to, know, [MASK], [MASK], [MASK], [MASK], country]
We can observe that instead of masking the tokens at random positions, we have masked the random contiguous span of tokens. Now, we feed the tokens to SpanBERT and get the representation of the tokens. As shown in the following figure, we mask the random contiguous span of tokens and feed them to the SpanBERT model, which returns the representation,
Training SpanBERT with the MLM and SBO
In order to predict the masked token, we train the SpanBERT model with the MLM objective along with a new objective called the span boundary objective (SBO). Let's explore how this works in detail.
Masked language modeling (MLM)
We know that in the MLM objective, to predict the masked token, our model uses the corresponding representation of the masked token. Suppose we need to predict the masked token ; so, with the representation , we can predict the masked token. We just feed the representation ...