Building and training the model
Once we have the text data in the form of tokens in an array, we are able to input it in the array format to the model. First, we have to define a number of hyperparameters for the model. This section will describe how to do the following:
- Declare model hyperparameters
- Build a model using
Word2Vec
- Train the model on the prepared dataset
- Save and checkpoint the trained model
Getting ready
Some of the model hyperparameters that are to be declared include the following:
- Dimensionality of resulting word vectors
- Minimum word count threshold
- Number of parallel threads to run while training the model
- Context window length
- Downsampling (for frequently occurring words)
- Setting a seed
Once the previously mentioned hyperparameters are declared, the model can be built using the Word2Vec
function from the Gensim
library.
How to do it...
The steps are as follows:
Declare the hyperparameters for the model using the following commands:
num_features = 300 min_word_count = 3 num_workers =...