Predicting ad click-through with a decision tree
After several examples, it is now time to predict ad click-through using the decision tree algorithm you have just thoroughly learned about and practiced with. We will use the dataset from a Kaggle machine learning competition, Click-Through Rate Prediction (https://www.kaggle.com/c/avazu-ctr-prediction). The dataset can be downloaded from https://www.kaggle.com/c/avazu-ctr-prediction/data.
Only the train.gz
file contains labeled samples, so we only need to download this and unzip it (it will take a while). In this chapter, we will focus on only the first 300,000 samples from the train.csv
file unzipped from train.gz
.
The fields in the raw file are as follows:
Figure 3.12: Description and example values of the dataset
We take a glance at the head of the file by running the following command:
head train | sed 's/,,/, ,/g;s/,,/, ,/g' | column -s, -t
Rather than a simple head train
, the output is...