Technical requirements
The code for this chapter is located at https://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/tree/main/Chapter03. Packages that are required for this chapter should be installed automatically via the poetry
environment.
In addition, we will use models and datasets located at the following URLs. The Google word2vec
model is a model that represents words as vectors, and the IMDB dataset contains movie titles, genres, and descriptions. Download them into the data
folder inside the root
directory:
- The Google
word2vec
model: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?resourcekey=0-wjGZdNAUop6WykTtMip30g - The IMDB movie dataset: https://github.com/venusanvi/imdb-movies/blob/main/IMDB-Movie-Data.csv (also available in the book’s GitHub repo)
In addition to the preceding files, we will use various functions from a simple classifier that we will create in the first recipe. This...