Using the model as an information retrieval tool
Establishing a search engine involves many significant engineering tasks, so in this section, we’ll go through the essential steps to deploy an LSI model to be part of a search engine. These steps are as follows:
- Load the saved objects.
- Preprocess the new document.
- Score the new document to get the latent topic scores.
- Calculate the similarity scores with the new document.
- Find documents with high similarity scores.
First, we will load the four saved objects. The objects include the dictionary list, the model, the BoW object, and the TF-IDF object.
Loading the dictionary list
Gensim has a utility function called datapath
. It points to the physical location of the file. Here is the code for it:
from gensim.corpora import Dictionaryfrom gensim.test.utils import datapath dict_file = datapath(path + “/gensim_dictionary_AGnews”) gensim_dictionary = Dictionary.load(dict_file)...