Authoring a pipeline
Let's assume that you need to create a repeatable workflow that has two steps:
- It loads the data from a registered dataset and splits it into training and test datasets. These datasets are converted into a special construct needed by the LightGBM tree-based algorithm. The converted constructs are stored to be used by the next step. In our case, you will use the loans dataset that you registered in Chapter 10, Understanding Model Results. You will be writing the code for this step within a folder named
step01
. - It loads the pre-processed data and trains a LightGBM model that is then stored in the
/models/loans/
folder of the default datastore attached to the AzureML workspace. You will be writing the code for this step within a folder namedstep02
.Each step will be a separate Python file, taking some arguments to specify where to read the data from and where to write the data to. These scripts will utilize the same mechanics as the scripts you authored...