Using Kubeflow Katib to optimize model hyperparameters
Katib is a framework for running multiple instances of the same job with differing inputs, such as in neural architecture search (for determining the right number and size of layers in a neural network) and hyperparameter search (finding the right learning rate, for example, for an algorithm). Like the other Kustomize templates we have seen, the TensorFlow job specifies a generic TensorFlow job, with placeholders for the parameters:
apiVersion: "kubeflow.org/v1alpha3"
kind: Experiment
metadata:
namespace: kubeflow
name: tfjob-example
spec:
parallelTrialCount: 3
maxTrialCount: 12
maxFailedTrialCount: 3
objective:
type: maximize
goal: 0.99
objectiveMetricName: accuracy_1
algorithm:
algorithmName: random
metricsCollectorSpec:
source:
fileSystemPath:
path: /train
kind: Directory
collector:
kind: TensorFlowEvent
parameters:
- name: --learning_rate
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.05"
- name: --batch_size
parameterType: int
feasibleSpace:
min: "100"
max: "200"
trialTemplate:
goTemplate:
rawTemplate: |-
apiVersion: "kubeflow.org/v1"
kind: TFJob
metadata:
name: {{.Trial}}
namespace: {{.NameSpace}}
spec:
tfReplicaSpecs:
Worker:
replicas: 1
restartPolicy: OnFailure
template:
spec:
containers:
- name: tensorflow
image: gcr.io/kubeflow-ci/tf-mnist-with-
summaries:1.0
imagePullPolicy: Always
command:
- "python"
- "/var/tf_mnist/mnist_with_summaries.py"
- "--log_dir=/train/metrics"
{{- with .HyperParameters}}
{{- range .}}
- "{{.Name}}={{.Value}}"
{{- end}}
{{- end}}
which we can run using the familiar kubectl
syntax:
kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/tfjob-example.yaml
or through the UI (Figure 2.16):
Figure 2.16: Katib UI on Kubeflow
where you can see a visual of the outcome of these multi-parameter experiments, or a table (Figures 2.17 and 2.18).
Figure 2.17: Kubeflow visualization for multi-dimensional parameter optimization
Figure 2.18: Kubeflow UI for multi-outcome experiments