Shining applications of BoW and TF-IDF
Although BoW and TF-IDF may appear simple, they already have real-world applications. Both techniques can capture the appearance and frequency of a word in a document. Different types of documents will have different word appearance and word frequency, so they can be applied to classify documents into different types. One important application is to prevent spam emails from going to the inbox folder of an email account. Spam emails are ubiquitous, unavoidable, and can quickly fill up the spam folder. BoW or TF-IDF helps to distinguish the characteristics of a spam email from regular emails. You may ask, if BoW and TF-IDF are effective, why do we still receive spam emails? This is because spam email writers try to compose spam emails that are as close as possible to regular emails, so an algorithm cannot distinguish them from regular emails.
Besides text classification, BoW has been expanded to Bag-of-Visual-Words (BoVW) to classify images....