Why visualize?
Visualizing data means displaying data in a graphical format such as a chart or graph. This is almost always a useful precursor to training a natural language processing (NLP) system to perform a specific task because it is typically very difficult to see patterns in large amounts of text data. It is often much easier to see overall patterns in data visually. These patterns might be very helpful in making decisions about the most applicable text-processing techniques.
Visualization can also be useful in understanding the results of NLP analysis and deciding what the next steps might be. Because looking at the results of NLP analysis is not an initial exploratory step, we will postpone this topic until Chapter 13 and Chapter 14.
In order to explore visualization, in this chapter, we will be working with a dataset of text documents. The text documents will illustrate a binary classification problem, which will be described in the next section.