In SpanBERT, we mask a contiguous span of tokens in the sentence. Let xsx_s and xex_e be the start and end position of the masked tokens, respectively. We feed the tokens to SpanBERT and it returns the representation of all the tokens. The representation of token ii is represented as RiR_i. The representation of the tokens in the span boundary is denoted as Rs1R_{s-1} and Re+1R_{e+1}.

The span boundary objective

Let's first look at the SBO. To predict the masked token, xix_i, we use three values, which are the representation of the tokens in the span boundary (Rs1R_{s-1} and Re+1R_{e+1}), and the position embedding of the masked token (pis+1p_{i-s+1}). Okay, how exactly do we predict the masked token with these three values? First, we create a new representation called ziz_i using a function f()f(⋅), with these three values as shown:

Get hands-on with 1400+ tech skills courses.