Global Vectors (GloVe) uses global word-word co-occurrences statistics in a large text corpus to arrive at dense vector representations of words. It is an unsupervised learning method with the objective of making the dot product of the learned vectors equal to the logarithm of the probability of co-occurrences. This translates to differences in vectors in the embedding space as the ratio of logarithms of ratio equals to the difference in logarithms.
For this example, we will use the GloVe embedding, which is pre-trained on Twitter data. This data consists of around 2 billion tweets with a vocabulary size of 1.2 million. For the classification task, we use the customer reviews or ratings of Amazon instant videos. First, we must load the reviews data in JSON format and convert it to a pandas DataFrame, as shown in the following code:
json_data...