SpanBERT is another interesting variant of BERT. As the name suggests, SpanBERT is mostly used for tasks such as question answering where we predict the span of text. Let's understand how SpanBERT works by looking into its architecture.
Understanding the architecture of SpanBERT
Let's understand SpanBERT with an example. Consider the following sentence:
You are expected to know the laws of your country
After tokenizing the sentence, we will have the tokens as follows:
tokens = [ you, are, expected, to, know, the, laws, of, your, country]
Instead of masking the tokens randomly, in SpanBERT, we mask the random contiguous span of tokens as shown:
tokens = [ you, are, expected, to, know, [MASK], [MASK], [MASK], [MASK], country]
We can observe that instead of masking the tokens at random positions, we have masked the random contiguous span of tokens. Now, we feed the tokens to SpanBERT and get the representation of the tokens. As shown in the following...