Understanding the bag of words model
In this book, we are going to choose a popular machine learning model called the bag of words to represent a document. To give you a better idea about this concept, imagine that we take all the words in a document, throw them in a bag, shake them well, and take them out in no particular order. This new random document might not have a meaningful value to a human being, but to machines it has the same value as the original document. That's why we implemented all of those functions in our service so far.
Basically, when we demolish the grammar structure, it gives us freedom to focus more on the word instances, their weights, and how often they are repeated in the document. We will find out why and how we can benefit from this model in our application.
Imagine that we have applied this model on our original document, and there are two other documents out there. Now we want to find out which one of the other two samples has the highest similarity to our original...