Now that we have the capability to compare between two documents, it is possible for us to set up our algorithms to extract out the most similar documents for an input query – simply index each of the documents, then search for the lowest distance value returned between the corpus and the query, and return the documents with the lowest distance values – these would be most similar. Luckily for us, however, Gensim has in-built structures to do this document similarity task!
We will be using the similarities module to construct this structure.
from gensim import similarities
We previously mentioned creating an index – we can do this far faster with the similarities module. As mentioned in the Gensim documentation for the Similarity class – the Similarity class splits the index into several smaller subindexes (shards), which are disk...