Given the preceding results it appears that people mostly only vote in a positive manner. We can look to see if there is a relationship between how many votes a company has received and their rating.
First, we accumulate the dataset using the following script, extracting the number of votes and rating for each firm:
#determine relationship between number of reviews and star rating import pandas as pd from pandas import DataFrame as df import numpy as np dfr2 = pd.DataFrame(columns=['reviews', 'rating']) mynparray = dfr2.values for line in lines: line = unicode(line, errors='ignore') obj = json.loads(line) reviews = int(obj['review_count']) rating = float(obj['stars']) arow = [reviews,rating] mynparray = np.vstack((mynparray,arow)) dfr2...