Natural Language Toolkit (NLTK) is a very powerful Python framework that implements most NLP algorithms and will be adopted in this chapter together with scikit-learn. Moreover, NLTK provides some built-in corpora that can be used to test algorithms. Before starting to work with NLTK, it's normally necessary to download all the additional elements (corpora, dictionaries, and so on) using a specific graphical interface. This can be done in the following way:
import nltk
>>> nltk.download()
This command will launch the user interface, as shown in the following figure:
It's possible to select every single feature or download all elements (I suggest this option if you have enough free space) to immediately exploit all NLTK functionalities.
NLTK can be installed using pip (pip install -U nltk) or with one of the binary distributions...