5. Topic Modeling
Activity 5.01: Topic-Modeling Jeopardy Questions
Solution
Let's perform topic modeling on the dataset of Jeopardy questions:
- Open a Jupyter Notebook.
- Insert a new cell and add the following code to import pandas and other libraries:
import numpy as np import spacy nlp = spacy.load('en_core_web_sm') import pandas as pd pd.set_option('display.max_colwidth', 800)
- After downloading the data, you can extract it and place at the location below. Then load the Jeopardy CSV file into a pandas DataFrame. Insert a new cell and add the following code:
JEOPARDY_CSV = '../data/jeopardy/Jeopardy.csv' questions = pd.read_csv(JEOPARDY_CSV) questions.columns = [x.strip() for x in questions.columns]
- The data in the DataFrame is not clean. In order to clean it, remove records that have missing values in the
Question
column. Add the following code to do this:questions = questions.dropna(subset=['Question'])
- Find...