Determining relationships between number of ratings and ratings
Given the preceding results it appears that people mostly only vote in a positive manner. We can look to see if there is a relationship between how many votes a company has received and their rating.
First, we accumulate the dataset using the following script, extracting the number of votes and rating for each firm:
#determine relationship between number of reviews and star ratingimport pandas as pdfrom pandas import DataFrame as df import numpy as np dfr2 = pd.DataFrame(columns=['reviews', 'rating'])mynparray = dfr2.valuesfor line in lines: line = unicode(line, errors='ignore') obj = json.loads(line) reviews = int(obj['review_count']) rating = float(obj['stars']) arow = [reviews,rating] mynparray = np.vstack((mynparray,arow)) dfr2 = df(mynparray)print (len(dfr2))
This coding just builds the data frame with our two variables. We are using NumPy as it more easily adds a row to a data frame. Once we are done with...