Pre-processing the data
Fields visitorid
and itemid
are already numeric, but we still need to convert the events into numeric values.
- We convert
view
events to1
,addtocart
events to2
, andtransaction
events to3
with the following code:
events.event.replace(to_replace=dict(view=1, addtocart=2, transaction=3), inplace=True)
- Drop the
transcationid
andtimestamp
columns that we don't need:
events.drop(['transactionid'],axis=1,inplace=True) events.drop(['timestamp'],axis=1,inplace=True)
- Shuffle the dataset to get random data for training and test datasets:
events = events.reindex(np.random.permutation(events.index))
The dataset can also be shuffled with the following command:
events = events.sample(frac=1).reset_index(drop=True)
- Split the data in
train
,valid
, andtest
sets, as follows:
split_1 = int(0.8 * len(events)) split_2 = int(0.9 * len(events)) train = events[:split_1] valid = events[split_1:split_2...