6. Vector Representation
Activity 6.01: Finding Similar News Article Using Document Vectors
Solution
Follow these steps to complete this activity:
- Open a Jupyter Notebook. Insert a new cell and add the following code to import all necessary libraries:
import warnings warnings.filterwarnings("ignore") from gensim.models import Doc2Vec import pandas as pd from gensim.parsing.preprocessing import preprocess_string, \ remove_stopwords
- Now load the
news_lines
file.news_file = '../data/sample_news_data.txt'
- After that, you need to iterate over each headline in the file and split the columns, then create a DataFrame containing the headlines. Insert a new cell and add the following code to implement this:
with open(news_file, encoding="utf8", errors='ignore') as f: news_lines = [line for line in f.readlines()] lines_df = pd.DataFrame() indices = list(range(len(news_lines))) lines_df['news'] = news_lines...