Getting started with semantic search
In this recipe, we will get a glimpse of how to get started on expanding search with the help of a word2vec
model. When we search for a term, we expect the search engine to show us a result with a synonym when we didn't use the exact term contained in the document. Search engines are far more complicated than what we'll show in the recipe, but this should give you a taste of what it's like to build a customizable search engine.
Getting ready
We will be using an IMDb dataset from Kaggle, which can be downloaded from https://www.kaggle.com/PromptCloudHQ/imdb-data. Download the dataset and unzip the CSV file.
We will also use a small-scale Python search engine called Whoosh. Install it using pip:
pip install whoosh
We will also be using the pretrained word2vec
model from the Using word embeddings recipe.
How to do it…
We will create a class for the Whoosh search engine that will create a document index based...