7. Vector Representation
Activity 12: Finding Similar Movie Lines Using Document Vectors
Solution
Let's build a movie search engine that finds similar movie lines to the one provided by the user. Follow these steps to complete this activity:
- Open a Jupyter notebook.
- Insert a new cell and add the following code to import all necessary libraries:
import warnings warnings.filterwarnings("ignore") from gensim.models import Doc2Vec import pandas as pd from gensim.parsing.preprocessing import preprocess_string, remove_stopwords
- Now we load the
movie_lines1
file. After that, we need to iterate over each movie line in the file and split the columns. Also, we need to create a DataFrame containing the movie lines. Insert a new cell and add the following code to implement this:movie_lines_file = '../data/cornell-movie-dialogs/movie_lines1.txt' with open(movie_lines_file) as f: movie_lines = [line.strip().split('+++$...