Running a test on a difficult problem
Throughout the chapter, we have provided recipes to handle tabular data in a successful way. Each recipe is not actually a solution in itself, but a piece of a puzzle. When the pieces are combined you can get excellent results and in this last recipe, we will demonstrate how to assemble all the recipes together to successfully complete a difficult Kaggle challenge.
The Kaggle competition, Amazon.com – Employee Access Challenge (https://www.kaggle.com/c/amazon-employee-access-challenge), is a competition that's notable for the high-cardinality variables involved and is a solid benchmark that's used to compare gradient boosting algorithms. The aim of the competition is to develop a model that can predict whether an Amazon employee should be given access to a specific resource based on their role and activities. The answer should be given as likelihood. As predictors, you have different ID codes corresponding to the type of resource...