Building LightGBM models
This section provides an end-to-end example of solving a real-world problem using LightGBM. We provide a more detailed look at data preparation for a problem and explain how to find suitable parameters for our algorithms. We use multiple variants of LightGBM to explore relative performance and compare them against random forests.
Cross-validation
Before we delve into solving a problem, we need to discuss a better way of validating algorithm performance. Splitting the data into two or three subsets is standard practice when training a model. The training data is used to train the model, the validation data is a hold-out set used to validate the data during training, and the test data is used to validate the performance after training.
In previous examples, we have done this split only once, building a single training and test to train and validate the model. The issue with this approach is that our model could get “lucky.” If, by chance...