What was the main limitation of our wordclouds? As we said, the absence of context. In other words, we were looking at isolated words, which don't help us to derive any meaning apart from the limited meaning contained within the single words themselves.
This is where n-gram analysis techniques come in. These techniques basically involve tokenizing the text into groups of words rather than into single words. These groups of words are called n-grams.
We can obtain n-grams from our comments dataset by simply applying the unnest_tokens function again, but this time passing "ngrams" as value to the token argument and 2 as the value to the n argument:
comments %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2) -> bigram_comments
Since we specified 2 as the value for the...