Summarization – state of the art
Today, the predominant approach to summarization uses the full Transformer architecture. Such models are quite big, often ranging from 223M parameters to over a billion in the case of GPT-3. Google Research published a paper at ICML in June 2020 titled PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. This paper sets the benchmark for state-of-the-art results as of the time of writing. The key innovation proposed by this model is a specific pre-training objective for summarization. Recall that BERT was pre-trained using a masked language model (MLM) objective, where tokens were randomly masked and the model had to predict them. The PEGASUS model proposed a Gap Sentence Generation (GSG) pre-training objective, where important sentences are completely replaced with a special masking token, and the model has to generate the sequence.
The importance of the sentence is judged using the ROUGE1-F1 score of a given...