Using information from visualization to make decisions about processing
This section includes guidance about how visualization can help us make decisions about processing. For example, in making a decision about whether to remove punctuation and stopwords, exploring word frequency visualizations such as frequency distribution and word clouds can tell us whether very common words are obscuring patterns in the data.
Looking at frequency distributions of words for different categories of data can help rule out simple keyword-based classification techniques.
Frequencies of different kinds of items, such as words and bigrams, can yield different insights. It can also be worth exploring the frequencies of other kinds of items, such as parts of speech or syntactic categories such as noun phrases.
Displaying document similarities with clustering can provide insight into the most meaningful number of classes that you would want to use in dividing a dataset.
The final section summarizes...