In this section, let's understand how to fine-tune the BERT model to perform text summarization. First, we will understand how to fine-tune BERT for extractive summarization, and then we will see how to fine-tune BERT for abstractive summarization.
Extractive summarization using BERT
To fine-tune the pre-trained BERT for the extractive summarization task, we slightly modify the input data format of the BERT model. Before looking into the modified input data format, first, let's recall how we feed the input data to the BERT model.
Say we have two sentences: Paris is a beautiful city. I love Paris. First, we tokenize the sentences and we add a [CLS] token only at the beginning of the first sentence and we add a [SEP] token at the end of every sentence. Before feeding the tokens to the BERT, we convert them into embedding using three embedding layers known as token embedding, segment embedding, and position embedding. We sum up all the...