We will use the following Python libraries: pandas, Matplotlib, and scikit-learn, which you can get by installing the Python Anaconda distribution, following the steps described in the Technical requirements section in Chapter 1, Foreseeing Variable Problems in Building ML Models.
We will also use NLTK from Python, a comprehensive library for NLP and text analysis. You can find instructions to install NLTK here: http://www.nltk.org/install.html. If you are using the Python Anaconda distribution, follow these instructions to install NLTK: https://anaconda.org/anaconda/nltk.
After you install NLTK, open up a Python console and execute the following:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
Those commands will download the necessary data to be able to run the recipes of this chapter successfully...