TextRank
TextRank is a popular algorithm for extractive text summarization. It is based on one of the most well-known algorithms of all time: PageRank. It was the first search-ordering algorithm used by Google to order search results. It works on the principle of ranking pages based on the total number of other pages referring to a given page. Similarly, for TextRank, the text units – typically sentences – are ranked by how similar other sentences are to a given sentence. The TextRank algorithm works as follows:
- Reads and extracts text from documents.
- Splits text into sentences.
- Converts sentences into vectors.
- Converts each word in a sentence into a vector.
- Finds the vector for the entire sentence – one approach would be averaging the word vectors.
- Calculates a similarity matrix among sentences. This is a matrix that measures how similar sentences are to each other.
- Creates a graph from the similarity matrix.
- Ranks the sentences...