We will discuss similarity scores in detail in Chapter 5, Getting Started with Data Mining Techniques. Presently, we will make use of the cosine similarity metric to build our models. The cosine score is extremely robust and easy to calculate (especially when used in conjunction with TF-IDFVectorizer).
The cosine similarity score between two documents, x and y, is as follows:
The cosine score can take any value between -1 and 1. The higher the cosine score, the more similar the documents are to each other. We now have a good theoretical base to proceed to build the content-based recommenders using Python.