Once we have begun to represent text documents in the form of vector representations, it is possible to start finding the similarity or distance between documents, and that is exactly what we will learn about in this chapter. We are now aware of a variety of different vector representations, from standard bag-of-words or TF-IDF to topic model representations of text documents. We will also learn about a very useful feature implemented in Gensim and how to use it—summarization and keyword extraction. Here's a summary of what we'll learn from this chapter:
- Similarity metrics
- Similarity queries
- Text summarization