Training an ensemble classifier model using LightGBM
Both random forest and gradient boosted trees are pretty powerful ML techniques due to their simple basis of decision trees and ensembles of multiple classifiers. In this example, we will use a popular library from Microsoft to implement both techniques on a test dataset: LightGBM, a framework for gradient boosting that incorporates multiple tree-based learning algorithms.
For this section, we will follow a typical best-practice approach using Azure Machine Learning and perform the following steps:
- Register the dataset in Azure.
- Create a remote compute cluster.
- Implement a configurable training script.
- Run the training script on the compute cluster.
- Log and collect the dataset, parameters, and performance.
- Register the trained model.
Before we start with this exciting approach, we'll take a quick look at why we chose LightGBM as a tool for training bagged and boosted tree ensembles.