In this section, we explore data munging techniques for typical text analysis situations. Many text-based analyses tasks require computing word counts, removing stop words, stemming, and so on. In addition, we will also explore how you can process multiple files, one at a time, from HDFS directories.
First, we import all the classes that will be used in this section: