We have 1,000 news articles from different publishers. Each article belongs to a different category: technical, entertainment, and others. Our case is to alleviate the cold start problem faced by our customers. Simply put, what do we recommend to a customer when we don't have any information about his preferences? We are either looking at the customer for the first time or we don't have any mechanism set up yet to capture customer interaction with our products/items.
This data is a subset of the news aggregator dataset from https://archive.ics.uci.edu/ml/datasets/News+Aggregator.
A subset of the data is stored in a csv file.
Let's quickly look at the data provided:
> library(tidyverse)
> library(tidytext)
> library(tm)
> library(slam)
>
>
> cnames <- c('ID' , 'TITLE' , 'URL' ,
+...