Building a LightGBM ML pipeline with Amazon SageMaker
The dataset we’ll use for our case study of building a SageMaker pipeline is the Census Income dataset from Chapter 4, Comparing LightGBM, XGBoost, and Deep Learning. This dataset is also available as a SageMaker sample dataset, so it’s easy to work with on SageMaker if you are getting started.
The pipeline we’ll build consists of the following steps:
- Data preprocessing.
- Model training and tuning.
- Model evaluation.
- Bias and explainability checks using Clarify.
- Model registration within SageMaker.
- Model deployment using an AWS Lambda.
Here’s a graph showing the complete pipeline:
Figure 9.2 – SageMaker ML pipeline for Census Income classification
Our approach is to create the entire pipeline using a Jupyter Notebook running in SageMaker Studio. The sections that follow explain and go through the code for each pipeline step, starting...