Let's use word embeddings to find all semantically similar words. To do this, we will use the textmineR package to create a skip-gram model. The objective of the skip-gram model is to look for terms that occur often within a given window of another term. Since these terms are so frequently close to each other within sentences in our text, we can conclude they have some connection to each other. We will start by using the following steps:
- To begin building our skip-gram model, we first create a term co-occurrence matrix by running the following code:
tcm <- CreateTcm(doc_vec = twenty_newsgroups$text,
skipgram_window = 10,
verbose = FALSE,
cpus = 2)
After running the code, you will have a sparse matrix in your environment window. The matrix has every possible term along both dimensions, as...