We started the chapter by introducing content-based filtering. We discussed how content based filtering methods can help with cold-start problems in recommendation systems. We then explained the new aggregator use case. We explored the data provided by the customer--various news articles from different publishers belonging to different categories. Based on the data, we came up with a design for our content-based recommendation system.
We implemented a similarity dictionary; given a news article, this dictionary would be able to provide the top N matching articles. The similarity was calculated based on the words present in the article. We leveraged the vector space model for text and ultimately used the cosine distance to find the similarities between articles.
We implemented a simple search based on the similarity dictionary to get a list of matching news articles. We...