Performing extractive summarization
In the chapter’s introduction, we mentioned that extractive summarization identifies important words or phrases and stitches them together to produce a condensed version of the original text. In this section, we use the previously created books.json
file and employ different methods to extract summaries for an input document. Due to space limitations and the need to focus on state-of-the-art techniques, we do not present the theory behind the methods. However, there is a plethora of online resources that can be consulted. A good starting point is the following link: https://miso-belica.github.io/sumy/summarizators.html.
Let’s begin by loading the data from the file and printing a few examples:
import pandas as pd df = pd.read_json('books.json') df.head() >> title product_description...