With word frequency analysis, we want to clean this data by removing the stop words, which would just clutter our interpretation. We'll explore the top overall word frequencies, then take a look at President Lincoln's work.
Word frequency
Word frequency in all addresses
To get rid of stop words in a tidy format, you can use the stop_words data frame provided in the tidytext package. You call that tibble into the environment, then do an anti-join by word:
> library(tidytext)
> data(stop_words)
> sotu_tidy <- sotu_unnest %>%
dplyr::anti_join(stop_words, by = "word")
Notice that the length of the data went from 1.97 million observations down to 778,161. Now, you can go ahead and see the top...