In this last step, before moving on to building our own model, we will use the textrank package to summarize the text. The approach this algorithm uses to summarize text is to look for a sentence with the most words that are also used in other sentences in the text data. We can see how this type of sentence would be a good candidate for summarizing the text since it contains many words found elsewhere. To get started, let's select a piece of text from our data:
- Let's view the text in row 400 by running the following code:
twenty_newsgroups$text[400]
When we run this line of code, we will see the following piece of text printed to the console:
In this email, we can see that the subject matter regards objecting to someone else's email because it is off-topic.
- Let's see which sentence the textrank algorithm will extract...