We have seen two methods of building vectors to represent text documents. The next question that comes up is:
How can you measure how similar or dissimilar text documents are and how can the vectors built so far be leveraged to have a solution to this problem?
If the words being used in two documents are similar, it indicates that the documents are similar as well. In this section, we will look into cosine similarity and use it to find how similar documents are based on the term vectors.
Cosine similarity
Cosine similarity provides insights into the angle between two vectors. Two vectors would be similar if they are pretty close in terms of both direction and magnitude. We will use techniques developed in the previous sections to build these vectors, and then figure out how close or far they are from each other using cosine similarity.
Cosine similarity helps in measuring the cosine of the angles between two vectors. The value...