5. Topic Modeling
Activity 10: Topic Modelling Jeopardy Questions
Solution
Let's perform topic modeling on the dataset of Jeopardy questions. Follow these steps to implement this activity:
- Open a Jupyter notebook.
- Insert a new cell and add the following code to import the pandas library:
import pandas as pd pd.set_option('display.max_colwidth', 800)
- To load the Jeopardy CSV file into a pandas DataFrame, insert a new cell and add the following code:
JEOPARDY_CSV = 'data/jeopardy/Jeopardy.csv' questions = pd.read_csv(JEOPARDY_CSV)
- The data in the DataFrame is not clean. In order to clean it, we remove records that have missing values in the Question column. Add the following code to do this:
questions = questions.dropna(subset=['Question'])
- Now import the gensim preprocessing utility and use it to preprocess the questions further. Add the following code to do this:
from gensim.parsing.preprocessing import preprocess_string...