Looking at combinations of words in, say, bigrams or trigrams can help you understand relationships between words. Using tidy methods again, we'll create bigrams and learn about those relationships to extract insights from the text. I will continue with the subject of President Lincoln as that will allow you to compare what you gain with n-grams versus just words. Getting started is easy, as you just specify the number of words to join. Notice in the following code that I maintain word capitalization:
> sotu_bigrams <- sotu_meta %>%
dplyr::filter(year > 1860 & year < 1865) %>%
tidytext::unnest_tokens(bigram, text, token = "ngrams", n = 2,
to_lower = FALSE)
Let's take a look at this:
> sotu_bigrams %>%
dplyr::count(bigram, sort = TRUE)
# A tibble: 17,687 x 2
bigram n
<chr> <int>
1 of the...