Document similarity with word embeddings
The practical use case of word vectors is to compare the semantic similarity between documents. If you are a retail bank, insurance company, or any other company that sells to end users, you will have to deal with support requests. You'll often find that many customers have similar requests, so by finding out how similar texts are semantically, previous answers to similar requests can be reused, and your organization's overall service can be improved.
spaCy has a built-in function to measure the similarity between two sentences. It also comes with pretrained vectors from the Word2Vec model, which is similar to GloVe. This method works by averaging the embedding vectors of all the words in a text and then measuring the cosine of the angle between the average vectors. Two vectors pointing in roughly the same direction will have a high similarity score, whereas vectors pointing in different directions will have a low similarity score. This is visualized...