Download the example code files
The Python notebooks are available for download at https://github.com/PacktPublishing/The-Handbook-of-NLP-with-Gensim. If there’s an update to the code, it will be updated in the GitHub repository. You are encouraged to use Google Colab. Google Colab is a free Jupyter Notebook environment that runs entirely in the cloud. Google Colab has already pre-installed popular machine-learning libraries such as pandas, NumPy, TensorFlow, Keras, and OpenCV.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Data for this book
The AG’s corpus of news articles, made public by A. Gulli, is a collection of more than 1 million news articles from more than 2,000 news sources. Zhang, Zhao, and LeCun sampled news articles from on “world”, “sports”, “business”, and “Science” categories. This dataset ag_news
is a frequently used dataset and is available in Kaggle, PyTorch, Huggingface, and Tensorflow. There are 120,000 and 7,600 news articles in the training and testing samples respectively. This dataset is used throughout the book.