In this chapter, we looked at how to address the massive volume of textual data that exists through text mining methods. We looked at a useful framework for text mining, including preparation, word frequency counts and visualization, and topic models using multiple packages in the tidyverse. Included in this framework were other quantitative techniques, such as polarity and formality, in order to provide a deeper lexical understanding, or what one could call style, with the qdap package. We applied the framework to the State of the Union addresses. Despite it not being practical to cover every possible text mining technique, those discussed in this chapter should be adequate for most problems that one might face.
In the next chapter, we are going to shift gears to reinforcement learning, where we train an algorithm to interactive with the environment to maximize rewards...