Summary
In this chapter, we learned the definition of cosine similarity. It measures the similarity of two vectors by the angle between the two vectors. If the angle between two vectors is small, the cosine value will be close to 1.0 and the two vectors are considered “similar.” If two vectors are orthogonal, the cosine value will be 0 and the two vectors are unrelated. The value of cosine similarity always lies in the interval [-1,1]. A high value in cosine similarity indicates a high level of similarity between two vectors. This fundamental metric is used throughout NLP, including the pre-LLM techniques and LLM techniques. It is also used in other fields, such as image comparison, to identify the similarities between the image feature vectors.
In the next chapter, we will build production models with latent semantic indexing (LSI).