Exploring the Retailrocket dataset
Let's load the dataset and explore it to learn more about the data.
- Set the path to the folder where we downloaded the data:
dsroot = os.path.join(os.path.expanduser('~'), 'datasets', 'kaggle-retailrocket') os.listdir(dsroot)
- Load the
events.csv
in a pandas DataFrame:
events = pd.read_csv(os.path.join(dsroot,'events.csv')) print('Event data\n',events.head())
The events data has the five columns of timestamp
, visitorid
, event
, itemid
, and transactionid
, as shown here:
Event data timestamp visitorid event itemid transactionid 0 1433221332117 257597 view 355908 NaN 1 1433224214164 992329 view 248676 NaN 2 1433221999827 111016 view 318965 NaN 3 1433221955914 483717 view 253185 NaN 4 1433221337106 951259 view 367447 NaN
- Print the unique items, users, and transactions:
print('Unique counts:',events.nunique())
Â
We get the following...