After several examples, it is now time to predict ad click-through with the decision tree algorithm we just thoroughly learned and practiced. We will use the dataset from a Kaggle machine learning competition Click-Through Rate Prediction (https://www.kaggle.com/c/avazu-ctr-prediction).
For now, we only take the first 100,000 samples from the train file (unzipped from the train.gz file from https://www.kaggle.com/c/avazu-ctr-prediction/data) for training the decision tree and the first 100,000 samples from the test file (unzipped from the test.gz file from the same page) for prediction purposes.
The data fields are described as follows:
- id: ad identifier, such as 1000009418151094273, 10000169349117863715
- click: 0 for non-click, 1 for click
- hour: in the format of YYMMDDHH, for example, 14102100
- C1: anonymized categorical variable, such as 1005, 1002
- banner_pos: where a...