Summary
In this chapter, we learned about some techniques for the initial exploration of text datasets. We started out by exploring data by looking at the frequency distributions of words and bigrams. We then discussed different visualization approaches, including word clouds, bar graphs, line graphs, and clusters. In addition to visualizations based on words, we also learned about clustering techniques for visualizing similarities among documents. Finally, we concluded with some general considerations for developing visualizations and summarized what can be learned from visualizing text data in various ways. The next chapter will cover how to select approaches for analyzing NLU data and two kinds of representations for text data – symbolic representations and numerical representations.