Technical requirements
In this chapter, we’ll utilize tools from the libraries that were introduced in Chapter 7 – that is, scikit-learn and Keras. Additionally, we will employ NLTK, a Python library that proves valuable for working with human language data. NLTK includes a range of modules and functions that empower us to execute tasks such as tokenization, stemming, and part-of-speech tagging on our selected databases. This library streamlines the process of processing extensive text datasets so that they’re ready to be integrated with machine learning or deep learning models.
If you have not worked with NLTK before, it can be installed with the following code:
pip install nltk
The documentation for nltk
can be found at https://www.nltk.org. Another essential library when handling text manipulation and cleaning is re, short for Regular Expression. A regular expression is a sequence of characters that defines a search pattern. Here’s an example...