In the previous section, we implemented an orientation baseline; now, let's focus on heavy machinery. We will follow the approach taken by the KDD Cup 2009 winning solution, developed by the IBM research team (Niculescu-Mizil and others).
To address this challenge, they used the ensemble selection algorithm (Caruana and Niculescu-Mizil, 2004). This is an ensemble method, which means it constructs a series of models and combines their output in a specific way, in order to provide the final classification. It has several desirable properties that make it a good fit for this challenge, as follows:
- It was proven to be robust, yielding excellent performance.
- It can be optimized for a specific performance metric, including AUC.
- It allows for different classifiers to be added to the library.
- It is an anytime method, meaning that if we run out of...