Creating a model
Imagine we are dealing with the following data in the sales_rating.csv
file that is the result of merging two datasets, one containing sales data and the other containing rating data. The data looks like this:
product_id,avg_rating,sold
1,2.5,100
2,3.7,200
3,4.2,300
4,1.3,50
5,4.9,800
6,3.2,150
7,2.1,80
8,4.8,500
9,3.9,400
10,2.4,200
11,4.1,300
12,3.2,100
13,2.9,150
14,4.5,500
15,3.8,400
16,2.7,200
17,4.3,300
18,3.4,100
19,2.3,150
20,4.7,500
The preceding data shows a dataset with product_id
, avg_rating
, and sold
columns. Our theory is that the average rating of a product is correlated with the number of sales. It seems like a fair assumption that a product with a high rating will sell more than a product with a low rating. By creating a model, we can come closer to determining whether it’s likely that our assumption is true or not.