Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Generative AI with Python and TensorFlow 2
Generative AI with Python and TensorFlow 2

Generative AI with Python and TensorFlow 2: Create images, text, and music with VAEs, GANs, LSTMs, Transformer models

Arrow left icon
Profile Icon Joseph Babcock Profile Icon Raghav Bali
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (28 Ratings)
Paperback Apr 2021 488 pages 1st Edition
eBook
€17.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Joseph Babcock Profile Icon Raghav Bali
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (28 Ratings)
Paperback Apr 2021 488 pages 1st Edition
eBook
€17.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€17.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Generative AI with Python and TensorFlow 2

Setting Up a TensorFlow Lab

Now that you have seen all the amazing applications of generative models in Chapter 1, An Introduction to Generative AI: "Drawing" Data from Models, you might be wondering how to get started with implementing these projects that use these kinds of algorithms. In this chapter, we will walk through a number of tools that we will use throughout the rest of the book to implement the deep neural networks that are used in various generative AI models. Our primary tool is the TensorFlow 2.0 framework, developed by Google1 2; however, we will also use a number of additional resources to make the implementation process easier (summarized in Table 2.1).

We can broadly categorize these tools:

  • Resources for replicable dependency management (Docker, Anaconda)
  • Exploratory tools for data munging and algorithm hacking (Jupyter)
  • Utilities to deploy these resources to the cloud and manage their lifecycle (Kubernetes, Kubeflow, Terraform)

Tool

Project site

Use

Docker

https://www.docker.com/

Application runtime dependency encapsulation

Anaconda

https://www.anaconda.com/

Python language package management

Jupyter

https://jupyter.org/

Interactive Python runtime and plotting / data exploration tool

Kubernetes

https://kubernetes.io/

Docker container orchestration and resource management

Kubeflow

https://www.kubeflow.org/

Machine learning workflow engine developed on Kubernetes

Terraform

https://www.terraform.io/

Infrastructure scripting language for configurable and consistent deployments of Kubeflow and Kubernetes

VSCode

https://code.visualstudio.com/

Integrated development environment (IDE)

Table 2.1: Tech stack for generative adversarial model development

On our journey to bring our code from our laptops to the cloud in this chapter, we will first describe some background on how TensorFlow works when running locally. We will then describe a wide array of software tools that will make it easier to run an end-to-end TensorFlow lab locally or in the cloud, such as notebooks, containers, and cluster managers. Finally, we will walk through a simple practical example of setting up a reproducible research environment, running local and distributed training, and recording our results. We will also examine how we might parallelize TensorFlow across multiple CPU/GPU units within a machine (vertical scaling) and multiple machines in the cloud (horizontal scaling) to accelerate training. By the end of this chapter, we will be all ready to extend this laboratory framework to tackle implementing projects using various generative AI models.

First, let's start by diving more into the details of TensorFlow, the library we will use to develop models throughout the rest of this book. What problem does TensorFlow solve for neural network model development? What approaches does it use? How has it evolved over the years? To answer these questions, let us review some of the history behind deep neural network libraries that led to the development of TensorFlow.

Deep neural network development and TensorFlow

As we will see in Chapter 3, Building Blocks of Deep Neural Networks, a deep neural network in essence consists of matrix operations (addition, subtraction, multiplication), nonlinear transformations, and gradient-based updates computed by using the derivatives of these components.

In the world of academia, researchers have historically often used efficient prototyping tools such as MATLAB3 to run models and prepare analyses. While this approach allows for rapid experimentation, it lacks elements of industrial software development, such as object-oriented (OO) development, that allow for reproducibility and clean software abstractions that allow tools to be adopted by large organizations. These tools also had difficulty scaling to large datasets and could carry heavy licensing fees for such industrial use cases. However, prior to 2006, this type of computational tooling was largely sufficient for most use cases. However, as the datasets being tackled with deep neural network algorithms grew, groundbreaking results were achieved such as:

  • Image classification on the ImageNet dataset4
  • Large-scale unsupervised discovery of image patterns in YouTube videos5
  • The creation of artificial agents capable of playing Atari video games and the Asian board game GO with human-like skill6 7
  • State-of-the-art language translation via the BERT model developed by Google8

The models developed in these studies exploded in complexity along with the size of the datasets they were applied to (see Table 2.2 to get a sense of the immense scale of some of these models). As industrial use cases required robust and scalable frameworks to develop and deploy new neural networks, several academic groups and large technology companies invested in the development of generic toolkits for the implementation of deep learning models. These software libraries codified common patterns into reusable abstractions, allowing even complex models to be often embodied in relatively simple experimental scripts.

Model Name

Year

# Parameters

AlexNet

2012

61M

YouTube CNN

2012

1B

Inception

2014

5M

VGG-16

2014

138M

BERT

2018

340M

GPT-3

2020

175B

Table 2.2: Number of parameters by model by year

Some of the early examples of these frameworks include Theano,9 a Python package developed at the University of Montreal, and Torch,10 a library written in the Lua language that was later ported to Python by researchers at Facebook, and TensorFlow, a C++ runtime with Python bindings developed by Google11.

In this book, we will primarily use TensorFlow 2.0, due to its widespread adoption and its convenient high-level interface, Keras, which abstracts much of the repetitive plumbing of defining routine layers and model architectures.

TensorFlow is an open-source version of an internal tool developed at Google called DistBelief.12 The DistBelief framework consisted of distributed workers (independent computational processes running on a cluster of machines) that would compute forward and backward gradient descent passes on a network (a common way to train neural networks we will discuss in Chapter 3, Building Blocks of Deep Neural Networks), and send the results to a Parameter Server that aggregated the updates. The neural networks in the DistBelief framework were represented as a Directed Acyclic Graph (DAG), terminating in a loss function that yielded a scalar (numerical value) comparing the network predictions with the observed target (such as image class or the probability distribution over a vocabulary representing the most probable next word in a sentence in a translation model).

A DAG is a software data structure consisting of nodes (operations) and data (edges) where information only flows in a single direction along the edges (thus directed) and where there are no loops (hence acyclic).

While DistBelief allowed Google to productionize several large models, it had limitations:

  • First, the Python scripting interface was developed with a set of pre-defined layers corresponding to underlying implementations in C++; adding novel layer types required coding in C++, which represented a barrier to productivity.
  • Secondly, while the system was well adapted for training feed-forward networks using basic Stochastic Gradient Descent (SGD) (an algorithm we will describe in more detail in Chapter 3, Building Blocks of Deep Neural Networks) on large-scale data, it lacked flexibility for accommodating recurrent, reinforcement learning, or adversarial learning paradigms – the latter of which is crucial to many of the algorithms we will implement in this book.
  • Finally, this system was difficult to scale down – to run the same job, for example, on a desktop with GPUs as well as a distributed environment with multiple cores per machine, and deployment also required a different technical stack.

Jointly, these considerations prompted the development of TensorFlow as a generic deep learning computational framework: one that could allow scientists to flexibly experiment with new layer architectures or cutting-edge training paradigms, while also allowing this experimentation to be run with the same tools on both a laptop (for early-stage work) and a computing cluster (to scale up more mature models), while also easing the transition between research and development code by providing a common runtime for both.

Though both libraries share the concept of the computation graph (networks represented as a graph of operations (nodes) and data (edges)) and a dataflow programming model (where matrix operations pass through the directed edges of a graph and have operations applied to them), TensorFlow, unlike DistBelief, was designed with the edges of the graph being tensors (n-dimensional matrices) and nodes of the graph being atomic operations (addition, subtraction, nonlinear convolution, or queues and other advanced operations) rather than fixed layer operations – this allows for much greater flexibility in defining new computations and even allowing for mutation and stateful updates (these being simply additional nodes in the graph).

The dataflow graph in essence serves as a "placeholder" where data is slotted into defined variables and can be executed on single or multiple machines. TensorFlow optimizes the constructed dataflow graph in the C++ runtime upon execution, allowing optimization, for example, in issuing commands to the GPU. The different computations of the graph can also be executed across multiple machines and hardware, including CPUs, GPUs, and TPUs (custom tensor processing chips developed by Google and available in the Google Cloud computing environment)11, as the same computations described at a high level in TensorFlow are implemented to execute on multiple backend systems.

Because the dataflow graph allows mutable state, in essence, there is also no longer a centralized parameter server as was the case for DistBelief (though TensorFlow can also be run in a distributed manner with a parameter server configuration), since different nodes that hold state can execute the same operations as any other worker nodes. Further, control flow operations such as loops allow for the training of variable-length inputs such as in recurrent networks (see Chapter 3, Building Blocks of Deep Neural Networks). In the context of training neural networks, the gradients of each layer are simply represented as additional operations in the graph, allowing optimizations such as velocity (as in the RMSProp or ADAM optimizers, described in Chapter 3, Building Blocks of Deep Neural Networks) to be included using the same framework rather than modifying the parameter server logic. In the context of distributed training, TensorFlow also has several checkpointing and redundancy mechanisms ("backup" workers in case of a single task failure) that make it suited to robust training in distributed environments.

TensorFlow 2.0

While representing operations in the dataflow graph as primitives allows flexibility in defining new layers within the Python client API, it also can result in a lot of "boilerplate" code and repetitive syntax. For this reason, the high-level API Keras14 was developed to provide a high-level abstraction; layers are represented using Python classes, while a particular runtime environment (such as TensorFlow or Theano) is a "backend" that executes the layer, just as the atomic TensorFlow operators can have different underlying implementations on CPUs, GPUs, or TPUs. While developed as a framework-agnostic library, Keras has been included as part of TensorFlow's main release in version 2.0. For the purposes of readability, we will implement most of our models in this book in Keras, while reverting to the underlying TensorFlow 2.0 code where it is necessary to implement particular operations or highlight the underlying logic. Please see Table 2.3 for a comparison between how various neural network algorithm concepts are implemented at a low (TensorFlow) or high (Keras) level in these libraries.

Object

TensorFlow implementation

Keras implementation

Neural network layer

Tensor computation

Python layer classes

Gradient calculation

Graph runtime operator

Python optimizer class

Loss function

Tensor computation

Python loss function

Neural network model

Graph runtime session

Python model class instance

Table 2.3: TensorFlow and Keras comparison

To show you the difference between the abstraction that Keras makes versus TensorFlow 1.0 in implementing basic neural network models, let's look at an example of writing a convolutional layer (see Chapter 3, Building Blocks of Deep Neural Networks) using both of these frameworks. In the first case, in TensorFlow 1.0, you can see that a lot of the code involves explicitly specifying variables, functions, and matrix operations, along with the gradient function and runtime session to compute the updates to the networks.

This is a multilayer perceptron in TensorFlow 1.015:

X = tf.placeholder(dtype=tf.float64)
Y = tf.placeholder(dtype=tf.float64)
num_hidden=128
# Build a hidden layer
W_hidden = tf.Variable(np.random.randn(784, num_hidden))
b_hidden = tf.Variable(np.random.randn(num_hidden))
p_hidden = tf.nn.sigmoid( tf.add(tf.matmul(X, W_hidden), b_hidden) )
# Build another hidden layer
W_hidden2 = tf.Variable(np.random.randn(num_hidden, num_hidden))
b_hidden2 = tf.Variable(np.random.randn(num_hidden))
p_hidden2 = tf.nn.sigmoid( tf.add(tf.matmul(p_hidden, W_hidden2), b_hidden2) )
# Build the output layer
W_output = tf.Variable(np.random.randn(num_hidden, 10))
b_output = tf.Variable(np.random.randn(10))
p_output = tf.nn.softmax( tf.add(tf.matmul(p_hidden2, W_output), 
           b_output) )
loss = tf.reduce_mean(tf.losses.mean_squared_error(
        labels=Y,predictions=p_output))
accuracy=1-tf.sqrt(loss)
minimization_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)
feed_dict = {
    X: x_train.reshape(-1,784),
    Y: pd.get_dummies(y_train)
}
with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for step in range(10000):
        J_value = session.run(loss, feed_dict)
        acc = session.run(accuracy, feed_dict)
        if step % 100 == 0:
            print("Step:", step, " Loss:", J_value," Accuracy:", acc)
            session.run(minimization_op, feed_dict)
    pred00 = session.run([p_output], feed_dict={X: x_test.reshape(-1,784)})

In contrast, the implementation of the same convolutional layer in Keras is vastly simplified through the use of abstract concepts embodied in Python classes, such as layers, models, and optimizers. Underlying details of the computation are encapsulated in these classes, making the logic of the code more readable.

Note also that in TensorFlow 2.0 the notion of running sessions (lazy execution, in which the network is only computed if explicitly compiled and called) has been dropped in favor of eager execution, in which the session and graph are called dynamically when network functions such as call and compile are executed, with the network behaving like any other Python class without explicitly creating a session scope. The notion of a global namespace in which variables are declared with tf.Variable() has also been replaced with a default garbage collection mechanism.

This is a multilayer perceptron layer in Keras15:

import TensorFlow as tf
from TensorFlow.keras.layers import Input, Dense
from keras.models import Model
l = tf.keras.layers
model = tf.keras.Sequential([
    l.Flatten(input_shape=(784,)),
    l.Dense(128, activation='relu'),
    l.Dense(128, activation='relu'),
    l.Dense(10, activation='softmax')
])
model.compile(loss='categorical_crossentropy', 
              optimizer='adam',
              metrics = ['accuracy'])
model.summary()
model.fit(x_train.reshape(-1,784),pd.get_dummies(y_train),nb_epoch=15,batch_size=128,verbose=1)

Now that we have covered some of the details of what the TensorFlow library is and why it is well-suited to the development of deep neural network models (including the generative models we will implement in this book), let's get started building up our research environment. While we could simply use a Python package manager such as pip to install TensorFlow on our laptop, we want to make sure our process is as robust and reproducible as possible – this will make it easier to package our code to run on different machines, or keep our computations consistent by specifying the exact versions of each Python library we use in an experiment. We will start by installing an Integrated Development Environment (IDE) that will make our research easier – VSCode.

VSCode

Visual Studio Code (VSCode) is an open-source code editor developed by Microsoft Corporation which can be used with many programming languages, including Python. It allows debugging and is integrated with version control tools such as Git; we can even run Jupyter notebooks (which we will describe later in this chapter) within VSCode. Instructions for installation vary by whether you are using a Linux, macOS, or Windows operating system: please see individual instructions at https://code.visualstudio.com for your system. Once installed, we need to clone a copy of the source code for the projects in this book using Git, with the command:

git clone git@github.com:PacktPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2.git

This command will copy the source code for the projects in this book to our laptop, allowing us to locally run and modify the code. Once you have the code copied, open the GitHub repository for this book using VSCode (Figure 2.1). We are now ready to start installing some of the tools we will need; open the file install.sh.

Figure 2.1: VSCode IDE

One feature that will be of particular use to us is the fact that VSCode has an integrated (Figure 2.2) terminal where we can run commands: you can access this by selecting View, then Terminal from the drop-down list, which will open a command-line prompt:

Figure 2.2: VSCode terminal

Select the TERMINAL tab, and bash for the interpreter; you should now be able to enter normal commands. Change the directory to Chapter_2, where we will run our installation script, which you can open in VSCode.

The installation script we will run will download and install the various components we will need in our end-to-end TensorFlow lab; the overarching framework we will use for these experiments will be the Kubeflow library, which handles the various data and training pipelines that we will utilize for our projects in the later chapters of this volume. In the rest of this chapter, we will describe how Kubeflow is built on Docker and Kubernetes, and how to set up Kubeflow on several popular cloud providers.

Kubernetes, the technology which Kubeflow is based on, is fundamentally a way to manage containerized applications created using Docker, which allows for reproducible, lightweight execution environments to be created and persisted for a variety of applications. While we will make use of Docker for creating reproducible experimental runtimes, to understand its place in the overall landscape of virtualization solutions (and why it has become so important to modern application development), let us take a detour to describe the background of Docker in more detail.

Docker: A lightweight virtualization solution

A consistent challenge in developing robust software applications is to make them run the same on a machine different than the one on which they are developed. These differences in environments could encompass a number of variables: operating systems, programming language library versions, and hardware such as CPU models.

Traditionally, one approach to dealing with this heterogeneity has been to use a Virtual Machine (VM). While VMs are useful to run applications on diverse hardware and operating systems, they are also limited by being resource-intensive (Figure 2.3): each VM running on a host requires the overhead resources to run a completely separate operating system, along with all the applications or dependencies within the guest system.

Figure 2.3: Virtual machines versus containers16

However, in some cases this is an unnecessary level of overhead; we do not necessarily need to run an entirely separate operating system, rather than just a consistent environment, including libraries and dependencies within a single operating system. This need for a lightweight framework to specify runtime environments prompted the creation of the Docker project for containerization in 2013. In essence, a container is an environment for running an application, including all dependencies and libraries, allowing reproducible deployment of web applications and other programs, such as a database or the computations in a machine learning pipeline. For our use case, we will use it to provide a reproducible Python execution environment (Python language version and libraries) to run the steps in our generative machine learning pipelines.

We will need to have Docker installed for many of the examples that will appear in the rest of this chapter and the projects in this book. For instructions on how to install Docker for your particular operating system, please refer to the directions at (https://docs.docker.com/install/). To verify that you have installed the application successfully, you should be able to run the following command on your terminal, which will print the available options:

docker run hello-world

Important Docker commands and syntax

To understand how Docker works, it is useful to walk through the template used for all Docker containers, a Dockerfile. As an example, we will use the TensorFlow container notebook example from the Kubeflow project (https://github.com/kubeflow/kubeflow/blob/master/components/example-notebook-servers/jupyter-tensorflow-full/cpu.Dockerfile).

This file is a set of instructions for how Docker should take a base operating environment, add dependencies, and execute a piece of software once it is packaged:

FROM public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-tensorflow:master-abf9ec48
# install - requirements.txt
COPY --chown=jovyan:users requirements.txt /tmp/requirements.txt
RUN python3 -m pip install -r /tmp/requirements.txt --quiet --no-cache-dir \
 && rm -f /tmp/requirements.txt

While the exact commands will differ between containers, this will give you a flavor for the way we can use containers to manage an application – in this case running a Jupyter notebook for interactive machine learning experimentation using a consistent set of libraries. Once we have installed the Docker runtime for our particular operating system, we would execute such a file by running:

Docker build -f <Dockerfilename> -t <image name:tag>

When we do this, a number of things happen. First, we retrieve the base filesystem, or image, from a remote repository, which is not unlike the way we collect JAR files from Artifactory when using Java build tools such as Gradle or Maven, or Python's pip installer. With this filesystem or image, we then set required variables for the Docker build command such as the username and TensorFlow version, and runtime environment variables for the container. We determine what shell program will be used to run the command, then we install dependencies we will need to run TensorFlow and the notebook application, and we specify the command that is run when the Docker container is started. Then we save this snapshot with an identifier composed of a base image name and one or more tags (such as version numbers, or, in many cases, simply a timestamp to uniquely identify this image). Finally, to actually start the notebook server running this container, we would issue the command:

Docker run <image name:tag>

By default, Docker will run the executable command in the Dockerfile file; in our present example, that is the command to start the notebook server. However, this does not have to be the case; we could have a Dockerfile that simply builds an execution environment for an application, and issue a command to run within that environment. In that case, the command would look like:

Docker run <image name:tag> <command>

The Docker run commands allow us to test that our application can successfully run within the environment specified by the Dockerfile; however, we usually want to run this application in the cloud where we can take advantage of distributed computing resources or the ability to host web applications exposed to the world at large, not locally. To do so, we need to move our image we have built to a remote repository, which may or may not be the same one we pulled the initial image from, using the push command:

Docker push <image name:tag>

Note that the image name can contain a reference to a particular registry, such as a local registry or one hosted on one of the major cloud providers such as Elastic Container Service (ECS) on AWS, Azure Kubernetes Service (AKS), or Google Container Registry. Publishing to a remote registry allows developers to share images, and us to make containers accessible to deploy in the cloud.

Connecting Docker containers with docker-compose

So far we have only discussed a few basic Docker commands, which would allow us to run a single service in a single container. However, you can probably appreciate that in the "real world" we usually need to have one or more applications running concurrently – for example, a website will have both a web application that fetches and processes data in response to activity from an end user and a database instance to log that information. In complex applications, the website might even be composed of multiple small web applications or microservices that are specialized to particular use cases such as the front end, user data, or an order management system. For these kinds of applications, we will need to have more than one container communicating with each other. The docker-compose tool (https://docs.docker.com/compose/) is written with such applications in mind: it allows us to specify several Docker containers in an application file using the YAML format. For example, a configuration for a website with an instance of the Redis database might look like:

version: '3'
services:
  web:
    build: .
    ports:
    - "5000:5000"
    volumes:
    - .:/code
    - logvolume01:/var/log
    links:
    - redis
  redis:
    image: redis
volumes:
  logvolume01: {}

Code 2.1: A yaml input file for Docker Compose

The two application containers here are web and the redis database. The file also specified the volumes (disks) linked to these two applications. Using this configuration, we can run the command:

docker-compose up

This starts all the containers specified in the YAML file and allows them to communicate with each other. However, even though Docker containers and docker-compose allow us to construct complex applications using consistent execution environments, we may potentially run into issues with robustness when we deploy these services to the cloud. For example, in a web application, we cannot be assured that the virtual machines that the application is running on will persist over long periods of time, so we need processes to manage self-healing and redundancy. This is also relevant to distributed machine learning pipelines, in which we do not want to have to kill an entire job because one node in a cluster goes down, which requires us to have backup logic to restart a sub-segment of work. Also, while Docker has the docker-compose functionality to link together several containers in an application, it does not have robust rules for how communication should happen among those containers, or how to manage them as a unit. For these purposes, we turn to the Kubernetes library.

Kubernetes: Robust management of multi-container applications

The Kubernetes project – sometimes abbreviated as k8s – was born out of an internal container management project at Google known as Borg. Kubernetes comes from the Greek word for navigator, as denoted by the seven-spoke wheel of the project's logo.18 Kubernetes is written in the Go programming language and provides a robust framework to deploy and manage Docker container applications on the underlying resources managed by cloud providers (such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP)).

Kubernetes is fundamentally a tool to control applications composed of one or more Docker containers deployed in the cloud; this collection of containers is known as a pod. Each pod can have one or more copies (to allow redundancy), which is known as a replicaset. The two main components of a Kubernetes deployment are a control plane and nodes. The control plane hosts the centralized logic for deploying and managing pods, and consists of (Figure 2.4):

Figure 2.4: Kubernetes components18

  • Kube-api-server: This is the main application that listens to commands from the user to deploy or update a pod, or manages external access to pods via ingress.
  • Kube-controller-manager: An application to manage functions such as controlling the number of replicas per pod.
  • Cloud-controller-manager: Manages functions particular to a cloud provider.
  • Etcd: A key-value store that maintains the environment and state variables of different pods.
  • Kube-scheduler: An application that is responsible for finding workers to run a pod.

While we could set up our own control plane, in practice we will usually have this function managed by our cloud provider, such as Google's Google Kubernetes Engine (GKE) or Amazon's Elastic Kubernetes Services (EKS). The Kubernetes nodes – the individual machines in the cluster – each run an application known as a kubelet, which monitors the pod(s) running on that node.

Now that we have a high-level view of the Kubernetes system, let's look at the important commands you will need to interact with a Kubernetes cluster, update its components, and start and stop applications.

Important Kubernetes commands

In order to interact with a Kubernetes cluster running in the cloud, we typically utilize the Kubernetes command-line tool (kubectl). Instructions for installing kubectl for your operating system can be found at (https://kubernetes.io/docs/tasks/tools/install-kubectl/). To verify that you have successfully installed kubectl, you can again run the help command in the terminal:

kubectl --help

Like Docker, kubectl has many commands; the important one that we will use is the apply command, which, like docker-compose, takes in a YAML file as input and communicates with the Kubernetes control plane to start, update, or stop pods:

kubectl apply -f <file.yaml>

As an example of how the apply command works, let us look at a YAML file for deploying a web server (nginx) application:

apiVersion: v1
kind: Service
metadata:
  name: my-nginx-svc
  labels:
    app: nginx
spec:
  type: LoadBalancer
  ports:
  - port: 80
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

The resources specified in this file are created on the Kubernetes cluster nodes in the order in which they are listed in the file. First, we create the load balancer, which routes external traffic between copies of the nginx web server. The metadata is used to tag these applications for querying later using kubectl. Secondly, we create a set of 3 replicas of the nginx pod, using a consistent container (image 1.7.9), which uses port 80 on their respective containers.

The same set of physical resources of a Kubernetes cluster can be shared among several virtual clusters using namespaces – this allows us to segregate resources among multiple users or groups. This can allow, for example, each team to run their own set of applications and logically behave as if they are the only users. Later, in our discussion of Kubeflow, we will see how this feature can be used to logically partition projects on the same Kubeflow instance.

Kustomize for configuration management

Like most code, we most likely want to ultimately store the YAML files we use to issue commands to Kubernetes in a version control system such as Git. This leads to some cases where this format might not be ideal: for example, in a machine learning pipeline, we might perform hyperparameter searches where the same application is being run with slightly different parameters, leading to a glut of duplicate command files.

Or, we might have arguments, such as AWS account keys, that for security reasons we do not want to store in a text file. We might also want to increase reuse by splitting our command into a base and additions; for example, in the YAML file shown in Code 2.1, if we wanted to run ngnix alongside different databases, or specify file storage in the different cloud object stores provided by Amazon, Google, and Microsoft Azure.

For these use cases, we will make use of the Kustomize tool (https://kustomize.io), which is also available through kubectl as:

kubectl apply -k <kustomization.yaml>

Alternatively, we could use the Kustomize command-line tool. A kustomization.yaml is a template for a Kubernetes application; for example, consider the following template for the training job in the Kubeflow example repository (https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/sample/kustomization.yaml):

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
  # Or
# github.com/kubeflow/pipelines/manifests/kustomize/env/gcp?ref=1.0.0
  - ../env/gcp
  # Kubeflow Pipelines servers are capable of 
  # collecting Prometheus metrics.
  # If you want to monitor your Kubeflow Pipelines servers 
  # with those metrics, you'll need a Prometheus server 
  # in your Kubeflow Pipelines cluster.
  # If you don't already have a Prometheus server up, you 
  # can uncomment the following configuration files for Prometheus.
  # If you have your own Prometheus server up already 
  # or you don't want a Prometheus server for monitoring, 
  # you can comment the following line out.
  # - ../third_party/prometheus
  # - ../third_party/grafana
# Identifier for application manager to apply ownerReference.
# The ownerReference ensures the resources get garbage collected
# when application is deleted.
commonLabels:
  application-crd-id: kubeflow-pipelines
# Used by Kustomize
configMapGenerator:
  - name: pipeline-install-config
    env: params.env
    behavior: merge
secretGenerator:
  - name: mysql-secret
    env: params-db-secret.env
    behavior: merge
# !!! If you want to customize the namespace,
# please also update 
# sample/cluster-scoped-resources/kustomization.yaml's 
# namespace field to the same value
namespace: kubeflow
#### Customization ###
# 1. Change values in params.env file
# 2. Change values in params-db-secret.env 
# file for CloudSQL username and password
# 3. kubectl apply -k ./
####

We can see that this file refers to a base set of configurations in a separate kustomization.yaml file located at the relative path ../base. To edit variables in this file, for instance, to change the namespace for the application, we would run:

kustomize edit set namespace mykube

We could also add configuration maps to pass to the training job, using a key-value format, for example:

kustomize edit add configmap configMapGenerator --from-literal=myVar=myVal

Finally, when we are ready to execute these commands on Kubernetes, we can build the necessary kubectl command dynamically and apply it, assuming kustomization.yaml is in the current directory.

kustomize build . |kubectl apply -f -

Hopefully, these examples demonstrate how Kustomize provides a flexible way to generate the YAML we need for kubectl using a template; we will make use of it often in the process of parameterizing our workflows later in this book.

Now that we have covered how Kubernetes manages Docker applications in the cloud, and how Kustomize can allow us to flexibly reuse kubectl yaml commands, let's look at how these components are tied together in Kubeflow to run the kinds of experiments we will be undertaking later to create generative AI models in TensorFlow.

Kubeflow: an end-to-end machine learning lab

As was described at the beginning of this chapter, there are many components of an end-to-end lab for machine learning research and development (Table 2.1), such as:

  • A way to manage and version library dependencies, such as TensorFlow, and package them for a reproducible computing environment
  • Interactive research environments where we can visualize data and experiment with different settings
  • A systematic way to specify the steps of a pipeline – data processing, model tuning, evaluation, and deployment
  • Provisioning of resources to run the modeling process in a distributed manner
  • Robust mechanisms for snapshotting historical versions of the research process

As we described earlier in this chapter, TensorFlow was designed to utilize distributed resources for training. To leverage this capability, we will use the Kubeflow projects. Built on top of Kubernetes, Kubeflow has several components that are useful in the end-to-end process of managing machine learning applications. To install Kubeflow, we need to have an existing Kubernetes control plane instance and use kubectl to launch Kubeflow's various components. The steps for setup differ slightly depending upon whether we are using a local instance or one of the major cloud providers.

Running Kubeflow locally with MiniKF

If we want to get started quickly or prototype our application locally, we can avoid setting up a cloud account and instead use virtual machines to simulate the kind of resources we would provision in the cloud. To set up Kubeflow locally, we first need to install VirtualBox (https://www.virtualbox.org/wiki/Downloads) to run virtual machines, and Vagrant to run configurations for setting up a Kubernetes control plane and Kubeflow on VirtualBox VMs (https://www.vagrantup.com/downloads.html).

Once you have these two dependencies installed, create a new directory, change into it, and run:

vagrant init arrikto/minikf
vagrant up

This initializes the VirtualBox configuration and brings up the application. You can now navigate to http://10.10.10.10/ and follow the instructions to launch Kubeflow and Rok (a storage volume for data used in experiments on Kubeflow created by Arrikto). Once these have been provisioned, you should see a screen like this (Figure 2.5):

Figure 2.5: MiniKF install screen in virtualbox19

Log in to Kubeflow to see the dashboard with the various components (Figure 2.6):

Figure 2.6: Kubeflow dashboard in MiniKF

We will return to these components later and go through the various functionalities available on Kubeflow, but first, let's walk through how to install Kubeflow in the cloud.

Installing Kubeflow in AWS

In order to run Kubeflow in AWS, we need a Kubernetes control plane available in the cloud. Fortunately, Amazon provides a managed service called EKS, which provides an easy way to provision a control plane to deploy Kubeflow. Follow the following steps to deploy Kubeflow on AWS:

  1. Register for an AWS account and install the AWS Command Line Interface

    This is needed to interact with the various AWS services, following the instructions for your platform located at https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html. Once it is installed, enter:

    aws configure
    

    to set up your account and key information to provision resources.

  2. Install eksctl

    This command-line utility allows us to provision a Kubernetes control plane in Amazon from the command line. Follow instructions at https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html to install.

  3. Install iam-authenticator

    To allow kubectl to interact with EKS, we need to provide the correct permissions using the IAM authenticator to modify our kubeconfig. Please see the installation instructions at https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html.

  4. Download the Kubeflow command-line tool

    Links are located at the Kubeflow releases page (https://github.com/kubeflow/kubeflow/releases/tag/v0.7.1). Download one of these directories and unpack the tarball using:

    tar -xvf kfctl_v0.7.1_<platform>.tar.gz
    
  5. Build the configuration file

    After entering environment variables for the Kubeflow application director (${KF_DIR}), the name of the deployment (${KF_NAME}), and the path to the base configuration file for the deployment (${CONFIG_URI}), which is located at https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_aws.0.7.1.yaml for AWS deployments, run the following to generate the configuration file:

    mkdir -p ${KF_DIR}
    cd ${KF_DIR}
    kfctl build -V -f ${CONFIG_URI}
    

    This will generate a local configuration file locally named kfctl_aws.0.7.1.yaml. If this looks like Kustomize, that's because kfctl is using Kustomize under the hood to build the configuration. We also need to add an environment variable for the location of the local config file, ${CONFIG_FILE}, which in this case is:

    export CONFIG_FILE=${KF_DIR}/kfctl_aws.0.7.1.yaml
    
  6. Launch Kubeflow on EKS

    Use the following commands to launch Kubeflow:

    cd ${KF_DIR}
    rm -rf kustomize/ 
    kfctl apply -V -f ${CONFIG_FILE}
    

    It will take a while for all the Kubeflow components to become available; you can check the progress by using the following command:

    kubectl -n kubeflow get all
    

    Once they are all available, we can get the URL address for the Kubeflow dashboard using:

    kubectl get ingress -n istio-system
    

This will take us to the dashboard view shown in the MiniKF examples above. Note that in the default configuration, this address is open to the public; for secure applications, we need to add authentication using the instructions at https://www.kubeflow.org/docs/aws/authentication/.

Installing Kubeflow in GCP

Like AWS, Google Cloud Platform (GCP) provides a managed Kubernetes control plane, GKE. We can install Kubeflow in GCP using the following steps:

  1. Register for a GCP account and create a project on the console

    This project will be where the various resources associated with Kubeflow will reside.

  2. Enable required services

    The services required to run Kubeflow on GCP are:

    • Compute Engine API
    • Kubernetes Engine API
    • Identity and Access Management (IAM) API
    • Deployment Manager API
    • Cloud Resource Manager API
    • Cloud Filestore API
    • AI Platform Training & Prediction API
  3. Set up OAuth (optional)

    If you wish to make a secure deployment, then, as with AWS, you must follow instructions to add authentication to your installation, located at (https://www.kubeflow.org/docs/gke/deploy/oauth-setup/). Alternatively, you can just use the name and password for your GCP account.

  4. Set up the GCloud CLI

    This is parallel to the AWS CLI covered in the previous section. Installation instructions are available at https://cloud.google.com/sdk/. You can verify your installation by running:

    gcloud --help
    
  5. Download the kubeflow command-line tool

    Links are located on the Kubeflow releases page (https://github.com/kubeflow/kubeflow/releases/tag/v0.7.1). Download one of these directories and unpack the tarball using:

    tar -xvf kfctl_v0.7.1_<platform>.tar.gz
    
  6. Log in to GCloud and create user credentials

    We next need to create a login account and credential token we will use to interact with resources in our account.

    gcloud auth login
    gcloud auth application-default login
    
  7. Set up environment variable and deploy Kubeflow

    As with AWS, we need to enter values for a few key environment variables: the application containing the Kubeflow configuration files (${KF_DIR}), the name of the Kubeflow deployment (${KF_NAME}), the path to the base configuration URI (${CONFIG_URI} – for GCP this is https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_gcp_iap.0.7.1.yaml), the name of the Google project (${PROJECT}), and the zone it runs in (${ZONE}).

  8. Launch Kubeflow

    The same as AWS, we use Kustomize to build the template file and launch Kubeflow:

    mkdir -p ${KF_DIR}
    cd ${KF_DIR}
    kfctl apply -V -f ${CONFIG_URI}
    

    Once Kubeflow is launched, you can get the URL to the dashboard using:

    kubectl -n istio-system get ingress
    

Installing Kubeflow on Azure

Azure is Microsoft Corporation's cloud offering, and like AWS and GCP, we can use it to install Kubeflow leveraging a Kubernetes control plane and computing resources residing in the Azure cloud.

  1. Register an account on Azure

    Sign up at https://azure.microsoft.com – a free tier is available for experimentation.

  2. Install the Azure command-line utilities

    See instructions for installation on your platform at https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest. You can verify your installation by running the following on the command line on your machine:

    az 
    

    This should print a list of commands that you can use on the console. To start, log in to your account with:

    az login
    

    And enter the account credentials you registered in Step 1. You will be redirected to a browser to verify your account, after which you should see a response like the following:

    "You have logged in. Now let us find all the subscriptions to which you have access": …
    [
    { 
        "cloudName": …
        "id" ….
    …
        "user": {
    …
    }
    }
    ]
    
  3. Create the resource group for a new cluster

    We first need to create the resource group where our new application will live, using the following command:

    az group create -n ${RESOURCE_GROUP_NAME} -l ${LOCATION}
    
  4. Create a Kubernetes resource on AKS

    Now deploy the Kubernetes control plane on your resource group:

    az aks create -g ${RESOURCE_GROUP_NAME} -n ${NAME} -s ${AGENT_SIZE} -c ${AGENT_COUNT} -l ${LOCATION} --generate-ssh-keys
    
  5. Install Kubeflow

    First, we need to obtain credentials to install Kubeflow on our AKS resource:

    az aks get-credentials -n ${NAME}  -g ${RESOURCE_GROUP_NAME}
    
  6. Install kfctl

    Install and unpack the tarball directory:

    tar -xvf kfctl_v0.7.1_<platform>.tar.gz
    
  7. Set environment variables

    As with AWS, we need to enter values for a few key environment variables: the application containing the Kubeflow configuration files (${KF_DIR}), the name of the Kubeflow deployment (${KF_NAME}), and the path to the base configuration URI (${CONFIG_URI} – for Azure, this is https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_k8s_istio.0.7.1.yaml).

  8. Launch Kubeflow

    The same as AWS, we use Kustomize to build the template file and launch Kubeflow:

    mkdir -p ${KF_DIR}
    cd ${KF_DIR}
    kfctl apply -V -f ${CONFIG_URI}
    

    Once Kubeflow is launched, you can use port forwarding to redirect traffic from local port 8080 to port 80 in the cluster to access the Kubeflow dashboard at localhost:8080 using the following command:

    kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
    

Installing Kubeflow using Terraform

For each of these cloud providers, you'll probably notice that we have a common set of commands; creating a Kubernetes cluster, installing Kubeflow, and starting the application. While we can use scripts to automate this process, it would be desirable to, like our code, have a way to version control and persist different infrastructure configurations, allowing a reproducible recipe for creating the set of resources we need to run Kubeflow. It would also help us potentially move between cloud providers without completely rewriting our installation logic.

The template language Terraform (https://www.terraform.io/) was created by HashiCorp as a tool for Infrastructure as a Service (IaaS). In the same way that Kubernetes has an API to update resources on a cluster, Terraform allows us to abstract interactions with different underlying cloud providers using an API and a template language using a command-line utility and core components written in GoLang (Figure 2.7). Terraform can be extended using user-written plugins.

Figure 2.7: Terraform architecture20

Let's look at one example of installing Kubeflow using Terraform instructions on AWS, located at https://github.com/aws-samples/amazon-eks-machine-learning-with-terraform-and-kubeflow. Once you have established the required AWS resources and installed terraform on an EC2 container, the aws-eks-cluster-and-nodegroup.tf Terraform file is used to create the Kubeflow cluster using the command:

terraform apply

In this file are a few key components. One is variables that specify aspects of the deployment:

variable "efs_throughput_mode" {
   description = "EFS performance mode"
   default = "bursting"
   type = string
}

Another is a specification for which cloud provider we are using:

provider "aws" {
  region                  = var.region
  shared_credentials_file = var.credentials
resource "aws_eks_cluster" "eks_cluster" {
  name            = var.cluster_name
  role_arn        = aws_iam_role.cluster_role.arn
  version         = var.k8s_version
 
  vpc_config {
    security_group_ids = [aws_security_group.cluster_sg.id]
    subnet_ids         = flatten([aws_subnet.subnet.*.id])
  }
 
  depends_on = [
    aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
    aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,
  ]
 
  provisioner "local-exec" {
    command = "aws --region ${var.region} eks update-kubeconfig --name ${aws_eks_cluster.eks_cluster.name}"
  }
 
  provisioner "local-exec" {
    when    = destroy
    command = "kubectl config unset current-context"
  }
 
}
  profile   = var.profile
}

And another is resources such as the EKS cluster:

resource "aws_eks_cluster" "eks_cluster" {
  name     = var.cluster_name
  role_arn = aws_iam_role.cluster_role.arn
  version  = var.k8s_version
 
  vpc_config {
    security_group_ids = [aws_security_group.cluster_sg.id]
    subnet_ids         = flatten([aws_subnet.subnet.*.id])
  }
 
  depends_on = [
    aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
    aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,
  ]
 
  provisioner "local-exec" {
    command = "aws --region ${var.region} eks update-kubeconfig --name ${aws_eks_cluster.eks_cluster.name}"
  }
 
  provisioner "local-exec" {
    when    = destroy
    command = "kubectl config unset current-context"
  }
 
}

Every time we run the Terraform apply command, it walks through this file to determine what resources to create, which underlying AWS services to call to create them, and with which set of configuration they should be provisioned. This provides a clean way to orchestrate complex installations such as Kubeflow in a versioned, extensible template language.

Now that we have successfully installed Kubeflow either locally or on a managed Kubernetes control plane in the cloud, let us take a look at what tools are available on the platform.

A brief tour of Kubeflow's components

Now that we have installed Kubeflow locally or in the cloud, let us take a look again at the Kubeflow dashboard (Figure 2.8):

Figure 2.8: The Kubeflow dashboard

Let's walk through what is available in this toolkit. First, notice in the upper panel we have a dropdown with the name anonymous specified – this is the namespace for Kubernetes referred to earlier. While our default is anonymous, we could create several namespaces on our Kubeflow instance to accommodate different users or projects. This can be done at login, where we set up a profile (Figure 2.9):

Figure 2.9: Kubeflow login page

Alternatively, as with other operations in Kubernetes, we can apply a namespace using a YAML file:

apiVersion: kubeflow.org/v1beta1
kind: Profile
metadata:
  name: profileName  
spec:
  owner:
    kind: User
    name: userid@email.com

Using the kubectl command:

kubectl create -f profile.yaml

What can we do once we have a namespace? Let us look through the available tools.

Kubeflow notebook servers

We can use Kubeflow to start a Jupyter notebook server in a namespace, where we can run experimental code; we can start the notebook by clicking the Notebook Servers tab in the user interface and selecting NEW SERVER (Figure 2.10):

Figure 2.10: Kubeflow notebook creation

We can then specify parameters, such as which container to run (which could include the TensorFlow container we examined earlier in our discussion of Docker), and how many resources to allocate (Figure 2.11).

Figure 2.11: Kubeflow Docker resources panel

You can also specify a Persistent Volume (PV) to store data that remains even if the notebook server is turned off, and special resources such as GPUs.

Once started, if you have specified a container with TensorFlow resources, you can begin running models in the notebook server.

Kubeflow pipelines

For notebook servers, we gave an example of a single container (the notebook instance) application. Kubeflow also gives us the ability to run multi-container application workflows (such as input data, training, and deployment) using the pipelines functionality. Pipelines are Python functions that follow a Domain Specific Language (DSL) to specify components that will be compiled into containers.

If we click pipelines on the UI, we are brought to a dashboard (Figure 2.12):

Figure 2.12: Kubeflow pipelines sashboard

Selecting one of these pipelines, we can see a visual overview of the component containers (Figure 2.13).

Figure 2.13: Kubeflow pipelines visualization

After creating a new run, we can specify parameters for a particular instance of this pipeline (Figure 2.14).

Figure 2.14: Kubeflow pipelines parameters

Once the pipeline is created, we can use the user interface to visualize the results (Figure 2.15):

Figure 2.15: Kubeflow pipeline results visualization

Under the hood, the Python code to generate this pipeline is compiled using the pipelines SDK. We could specify the components to come either from a container with Python code:

@kfp.dsl.component
def my_component(my_param):
  ...
  return kfp.dsl.ContainerOp(
    name='My component name',
    image='gcr.io/path/to/container/image'
  )
or a function written in Python itself:
@kfp.dsl.python_component(
  name='My awesome component',
  description='Come and play',
)
def my_python_func(a: str, b: str) -> str:

For a pure Python function, we could turn this into an operation with the compiler:

my_op = compiler.build_python_component(
  component_func=my_python_func,
  staging_gcs_path=OUTPUT_DIR,
  target_image=TARGET_IMAGE)

We then use the dsl.pipeline decorator to add this operation to a pipeline:

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline(param_1: PipelineParam, param_2: PipelineParam):
  my_step = my_op(a='a', b='b')

We compile it using the following code:

kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip')

and run it with this code:

client = kfp.Client()
my_experiment = client.create_experiment(name='demo')
my_run = client.run_pipeline(my_experiment.id, 'my-pipeline', 
  'my-pipeline.zip')

We can also upload this ZIP file to the pipelines UI, where Kubeflow can use the generated YAML from compilation to instantiate the job.

Now that you have seen the process for generating results for a single pipeline, our next problem is how to generate the optimal parameters for such a pipeline. As you will see in Chapter 3, Building Blocks of Deep Neural Networks, neural network models typically have a number of configurations, known as hyperparameters, which govern their architecture (such as number of layers, layer size, and connectivity) and training paradigm (such as learning rate and optimizer algorithm). Kubeflow has a built-in utility for optimizing models for such parameter grids, called Katib.

Using Kubeflow Katib to optimize model hyperparameters

Katib is a framework for running multiple instances of the same job with differing inputs, such as in neural architecture search (for determining the right number and size of layers in a neural network) and hyperparameter search (finding the right learning rate, for example, for an algorithm). Like the other Kustomize templates we have seen, the TensorFlow job specifies a generic TensorFlow job, with placeholders for the parameters:

apiVersion: "kubeflow.org/v1alpha3"
kind: Experiment
metadata:
  namespace: kubeflow
  name: tfjob-example
spec:
  parallelTrialCount: 3
  maxTrialCount: 12
  maxFailedTrialCount: 3
  objective:
    type: maximize
    goal: 0.99
    objectiveMetricName: accuracy_1
  algorithm:
    algorithmName: random
  metricsCollectorSpec:
    source:
      fileSystemPath:
        path: /train
        kind: Directory
    collector:
      kind: TensorFlowEvent
  parameters:
    - name: --learning_rate
      parameterType: double
      feasibleSpace:
        min: "0.01"
        max: "0.05"
    - name: --batch_size
      parameterType: int
      feasibleSpace:
        min: "100"
        max: "200"
  trialTemplate:
    goTemplate:
        rawTemplate: |-
          apiVersion: "kubeflow.org/v1"
          kind: TFJob
          metadata:
            name: {{.Trial}}
            namespace: {{.NameSpace}}
          spec:
           tfReplicaSpecs:
            Worker:
              replicas: 1 
              restartPolicy: OnFailure
              template:
                spec:
                  containers:
                    - name: tensorflow 
                      image: gcr.io/kubeflow-ci/tf-mnist-with-
                             summaries:1.0
                      imagePullPolicy: Always
                      command:
                        - "python"
                        - "/var/tf_mnist/mnist_with_summaries.py"
                        - "--log_dir=/train/metrics"
                        {{- with .HyperParameters}}
                        {{- range .}}
                        - "{{.Name}}={{.Value}}"
                        {{- end}}
                        {{- end}}

which we can run using the familiar kubectl syntax:

kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/tfjob-example.yaml

or through the UI (Figure 2.16):

Figure 2.16: Katib UI on Kubeflow

where you can see a visual of the outcome of these multi-parameter experiments, or a table (Figures 2.17 and 2.18).

Figure 2.17: Kubeflow visualization for multi-dimensional parameter optimization

Figure 2.18: Kubeflow UI for multi-outcome experiments

Summary

In this chapter, we have covered an overview of what TensorFlow is and how it serves as an improvement over earlier frameworks for deep learning research. We also explored setting up an IDE, VSCode, and the foundation of reproducible applications, Docker containers. To orchestrate and deploy Docker containers, we discussed the Kubernetes framework, and how we can scale groups of containers using its API. Finally, I described Kubeflow, a machine learning framework built on Kubernetes which allows us to run end-to-end pipelines, distributed training, and parameter search, and serve trained models. We then set up a Kubeflow deployment using Terraform, an IaaS technology.

Before jumping into specific projects, we will next cover the basics of neural network theory and the TensorFlow and Keras commands that you will need to write basic training jobs on Kubeflow.

References

  1. Abadi, Martín, et al. (2016) TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467. https://arxiv.org/abs/1603.04467.
  2. Google. TensorFlow. Retrieved April 26, 2021, from https://www.tensorflow.org/
  3. MATLAB, Natick, Massachusetts: The MathWorks Inc. https://www.mathworks.com/products/matlab.html
  4. Krizhevsky A., Sutskever I., & Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks. https://papers.nips.cc/paper/4824-imagenet-classification-with-deepconvolutional-neural-networks.pdf
  5. Dean J., Ng A., (2012, Jun 26). Using large-scale brain simulations for machine learning and A.I.. Google | The Keyword. https://blog.google/technology/ai/using-large-scale-brain-simulations-for/
  6. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602. https://arxiv.org/abs/1312.5602
  7. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D. (2017) Mastering the game of Go without human knowledge. Nature. 550(7676):354-359. https://pubmed.ncbi.nlm.nih.gov/29052630/
  8. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. https://arxiv.org/abs/1810.04805
  9. Al-Rfou, R., et al. (2016). Theano: A Python framework for fast computation of mathematical expressions. arXiv. https://arxiv.org/pdf/1605.02688.pdf
  10. Collobert R., Kavukcuoglu K., & Farabet C. (2011). Torch7: A Matlab-like Environment for Machine Learning. http://ronan.collobert.com/pub/matos/2011_torch7_nipsw.pdf
  11. Abadi M., et al. (2015). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. download.tensorflow.org/paper/whitepaper2015.pdf
  12. Abadi, Martín, et al. (2016) TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467. https://arxiv.org/abs/1603.04467
  13. Jouppi, N P, et al. (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. arXiv:1704.04760. https://arxiv.org/abs/1704.04760
  14. van Merriënboer, B., Bahdanau, D., Dumoulin, V., Serdyuk, D., Warde-Farley, D., Chorowski, J., Bengio, Y. (2015). Blocks and Fuel: Frameworks for deep learning. arXiv:1506.00619. https://arxiv.org/pdf/1506.00619.pdf
  15. https://stackoverflow.com/questions/57273888/keras-vs-TensorFlow-code-comparison-sources
  16. Harris M. (2016). Docker vs. Virtual Machine. Nvidia developer blog. https://developer.nvidia.com/blog/nvidia-docker-gpu-server-application-deployment-made-easy/vm_vs_docker/
  17. A visual play on words — the project's original code name was Seven of Nine, a Borg character from the series Star Trek: Voyager
  18. Kubernetes Components. (2021, March 18) Kubernetes. https://kubernetes.io/docs/concepts/overview/components/
  19. Pavlou C. (2019). An end-to-end ML pipeline on-prem: Notebooks & Kubeflow Pipelines on the new MiniKF. Medium | Kubeflow. https://medium.com/kubeflow/an-end-to-end-ml-pipeline-on-prem-notebooks-kubeflow-pipelines-on-the-new-minikf-33b7d8e9a836
  20. Vargo S. (2017). Managing Google Calendar with Terraform. HashiCorp. https://www.hashicorp.com/blog/managing-google-calendar-with-terraform
Left arrow icon Right arrow icon

Key benefits

  • Code examples are in TensorFlow 2, which make it easy for PyTorch users to follow along
  • Look inside the most famous deep generative models, from GPT to MuseGAN
  • Learn to build and adapt your own models in TensorFlow 2.x
  • Explore exciting, cutting-edge use cases for deep generative AI

Description

Machines are excelling at creative human skills such as painting, writing, and composing music. Could you be more creative than generative AI? In this book, you’ll explore the evolution of generative models, from restricted Boltzmann machines and deep belief networks to VAEs and GANs. You’ll learn how to implement models yourself in TensorFlow and get to grips with the latest research on deep neural networks. There’s been an explosion in potential use cases for generative models. You’ll look at Open AI’s news generator, deepfakes, and training deep learning agents to navigate a simulated environment. Recreate the code that’s under the hood and uncover surprising links between text, image, and music generation.

Who is this book for?

This is a book for Python programmers who are keen to create and have some fun using generative models. To make the most out of this book, you should have a basic familiarity with math and statistics for machine learning.

What you will learn

  • Export the code from GitHub into Google Colab to see how everything works for yourself
  • Compose music using LSTM models, simple GANs, and MuseGAN
  • Create deepfakes using facial landmarks, autoencoders, and pix2pix GAN
  • Learn how attention and transformers have changed NLP
  • Build several text generation pipelines based on LSTMs, BERT, and GPT-2
  • Implement paired and unpaired style transfer with networks like StyleGAN
  • Discover emerging applications of generative AI like folding proteins and creating videos from images

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 30, 2021
Length: 488 pages
Edition : 1st
Language : English
ISBN-13 : 9781800200883
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Apr 30, 2021
Length: 488 pages
Edition : 1st
Language : English
ISBN-13 : 9781800200883
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 116.97
Machine Learning with PyTorch and Scikit-Learn
€41.99
Machine Learning for Time-Series with Python
€41.99
Generative AI with Python and TensorFlow 2
€32.99
Total 116.97 Stars icon

Table of Contents

15 Chapters
An Introduction to Generative AI: "Drawing" Data from Models Chevron down icon Chevron up icon
Setting Up a TensorFlow Lab Chevron down icon Chevron up icon
Building Blocks of Deep Neural Networks Chevron down icon Chevron up icon
Teaching Networks to Generate Digits Chevron down icon Chevron up icon
Painting Pictures with Neural Networks Using VAEs Chevron down icon Chevron up icon
Image Generation with GANs Chevron down icon Chevron up icon
Style Transfer with GANs Chevron down icon Chevron up icon
Deepfakes with GANs Chevron down icon Chevron up icon
The Rise of Methods for Text Generation Chevron down icon Chevron up icon
NLP 2.0: Using Transformers to Generate Text Chevron down icon Chevron up icon
Composing Music with Generative Models Chevron down icon Chevron up icon
Play Video Games with Generative AI: GAIL Chevron down icon Chevron up icon
Emerging Applications in Generative AI Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4
(28 Ratings)
5 star 75%
4 star 3.6%
3 star 10.7%
2 star 7.1%
1 star 3.6%
Filter icon Filter
Top Reviews

Filter reviews by




Khawaja Muddassar Oct 01, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I would rate my Packt product a solid 5 out of 5! The content is well-structured, informative, and packed with practical examples that enhance understanding and implementation. Highly recommended!
Feefo Verified review Feefo
Junling Hu May 25, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is an excellent book that provides a practical introduction to GAN, VAE and other generative models. It did an excellent job of introducing to Restricted Boltzmann machine (RBM), Variational AutoEncoder (VAE), GAN, LSTM and music generation. It also introduced transformer and GPT, but not in depth. There are coding examples for every chapter, thus making this book very practical to use.It is an ambitious book. It tries to cover many different domains of AI, starting with introduction to neural networks, ending with deep reinforcement learning. The weaker part is in reinforcement learning, which is not necessarily needed for this book.In general, I recommend this book to anyone who is interested in learning GAN and VAE, the two most important generative models in AI today.
Amazon Verified review Amazon
dan resnic Jun 14, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
My AI experience has always focused on analytics but this provided a really interesting perspective into a subject i knew nothing about.I got the book more out of curiosity and I'll probably never use the examples and teachings, but boy was it fun to read.Do you know how the electronics involved in Pink Floyd's riggs work? Probably not, but if you were to read about it you would probably enjoy every moment of it. The same logic applies here.Highly recommended!
Amazon Verified review Amazon
Colbert Philippe Aug 08, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I love this book along several books from PACKT Publishing. This book gives the theory and references for further reading on the top. Excellent work from the Author!
Amazon Verified review Amazon
dr t Jul 13, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book is targeted at programmers and machine learning practitioners who want to learn about generative models. It is not for complete beginners - as the authors themselves state, one should have a basic understanding of probability, linear algebra, and deep learning.The book starts by introducing core concepts such as Docker, Kubernetes and Tensorflow, and then moves onto more advanced (and interesting) topics such as generating digits, images, music and deep fakes, and even copying artist’s styles to generate new art. Consequently, as the reader progresses through the book there are ample opportunities to use his/her coding skills to experiment.For this reader there were two chapters of special interest.- Chapter 8 on using transformers to generate text, in which BERT and GPT are introduced and explained, and then a program created to generate fake headlines. Having recently read Rothman’s book on transformers this chapter was very useful.- Chapter 10 in which various aspects of reinforcement learning are discussed.By the end of the book, this reader definitely felt that he had acquired a much firmer understanding of GANs and what they are capable of. Recommended and a good book to gain knowledge to have in one’s arsenal.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.