Kubeflow pipelines
For notebook servers, we gave an example of a single container (the notebook instance) application. Kubeflow also gives us the ability to run multi-container application workflows (such as input data, training, and deployment) using the pipelines functionality. Pipelines are Python functions that follow a Domain Specific Language (DSL) to specify components that will be compiled into containers.
If we click pipelines on the UI, we are brought to a dashboard (Figure 2.12):
Figure 2.12: Kubeflow pipelines sashboard
Selecting one of these pipelines, we can see a visual overview of the component containers (Figure 2.13).
Figure 2.13: Kubeflow pipelines visualization
After creating a new run, we can specify parameters for a particular instance of this pipeline (Figure 2.14).
Figure 2.14: Kubeflow pipelines parameters
Once the pipeline is created, we can use the user interface to visualize the results (Figure 2.15):
Figure 2.15: Kubeflow pipeline results visualization
Under the hood, the Python code to generate this pipeline is compiled using the pipelines SDK. We could specify the components to come either from a container with Python code:
@kfp.dsl.component
def my_component(my_param):
...
return kfp.dsl.ContainerOp(
name='My component name',
image='gcr.io/path/to/container/image'
)
or a function written in Python itself:
@kfp.dsl.python_component(
name='My awesome component',
description='Come and play',
)
def my_python_func(a: str, b: str) -> str:
For a pure Python function, we could turn this into an operation with the compiler:
my_op = compiler.build_python_component(
component_func=my_python_func,
staging_gcs_path=OUTPUT_DIR,
target_image=TARGET_IMAGE)
We then use the dsl.pipeline
decorator to add this operation to a pipeline:
@kfp.dsl.pipeline(
name='My pipeline',
description='My machine learning pipeline'
)
def my_pipeline(param_1: PipelineParam, param_2: PipelineParam):
my_step = my_op(a='a', b='b')
We compile it using the following code:
kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip')
and run it with this code:
client = kfp.Client()
my_experiment = client.create_experiment(name='demo')
my_run = client.run_pipeline(my_experiment.id, 'my-pipeline',
'my-pipeline.zip')
We can also upload this ZIP file to the pipelines UI, where Kubeflow can use the generated YAML from compilation to instantiate the job.
Now that you have seen the process for generating results for a single pipeline, our next problem is how to generate the optimal parameters for such a pipeline. As you will see in Chapter 3, Building Blocks of Deep Neural Networks, neural network models typically have a number of configurations, known as hyperparameters, which govern their architecture (such as number of layers, layer size, and connectivity) and training paradigm (such as learning rate and optimizer algorithm). Kubeflow has a built-in utility for optimizing models for such parameter grids, called Katib.