Vertex AI Workbench: Your Complete Guide to Scaling Machine Learning with Google Cloud

This article is an excerpt from the book, "The Definitive Guide to Google Vertex AI", by Jasmeet Bhatia, Kartik Chaudhary. The Definitive Guide to Google Vertex AI is for ML practitioners who want to learn Google best practices, MLOps tooling, and turnkey AI solutions for solving large-scale real-world AI/ML problems. This book takes a hands-on approach to help you become an ML rockstar on Google Cloud Platform in no time.

Introduction

While working on an ML project, if we are running a Jupyter Notebook in a local environment, or using a web-based Colab- or Kaggle-like kernel, we can perform some quick experiments and get some initial accuracy or results from ML algorithms very fast. But we hit a wall when it comes to performing large-scale experiments, launching long-running jobs, hosting a model, and also in the case of model monitoring. Additionally, if the data related to a project requires some more granular permissions on security and privacy (fine-grained control over who can view/access the data), it’s not feasible in local or Colab-like environments. All these challenges can be solved just by moving to the cloud. Vertex AI Workbench within Google Cloud is a JupyterLab-based environment that can be leveraged for all kinds of development needs of a typical data science project. The JupyterLab environment is very similar to the Jupyter Notebook environment, and thus we will be using these terms interchangeably throughout the book.

Vertex AI Workbench has options for creating managed notebook instances as well as user-managed notebook instances. User-managed notebook instances give more control to the user, while managed notebooks come with some key extra features. We will discuss more about these later in this section. Some key features of the Vertex AI Workbench notebook suite include the following:

Fully managed–Vertex AI Workbench provides a Jupyter Notebook-based fully managed environment that provides enterprise-level scale without managing infrastructure, security, and user-management capabilities.
Interactive experience–Data exploration and model experiments are easier as managed notebooks can easily interact with other Google Cloud services such as storage systems, big data solutions, and so on.
Prototype to production AI–Vertex AI notebooks can easily interact with other Vertex AI tools and Google Cloud services and thus provide an environment to run end-to-end ML projects from development to deployment with minimal transition.
Multi-kernel support–Workbench provides multi-kernel support in a single managed notebook instance including kernels for tools such as TensorFlow, PyTorch, Spark, and R. Each of these kernels comes with pre-installed useful ML libraries and lets us install additional libraries as required.
Scheduling notebooks–Vertex AI Workbench lets us schedule notebook runs on an ad hoc and recurring basis. This functionality is quite useful in setting up and running large-scale experiments quickly. This feature is available through managed notebook instances. More information will be provided on this in the coming sections.

With this background, we can now start working with Jupyter Notebooks on Vertex AI Workbench. The next section provides basic guidelines for getting started with notebooks on Vertex AI.

Getting started with Vertex AI Workbench

Go to the Google Cloud console and open Vertex AI from the products menu on the left pane or by using the search bar on the top. Inside Vertex AI, click on Workbench, and it will open a page very similar to the one shown in Figure 4.3. More information on this is available in the official

documentation (https://cloud.google.com/vertex-ai/docs/workbench/ introduction).

vertex-ai-workbench-your-complete-guide-to-scaling-machine-learning-with-google-cloud-img-1

Figure 4.3 – Vertex AI Workbench UI within the Google Cloud console

As we can see, Vertex AI Workbench is basically Jupyter Notebook as a service with the flexibility of working with managed as well as user-managed notebooks. User-managed notebooks are suitable for use cases where we need a more customized environment with relatively higher control. Another good thing about user-managed notebooks is that we can choose a suitable Docker container based on our development needs; these notebooks also let us change the type/size of the instance later on with a restart.

To choose the best Jupyter Notebook option for a particular project, it’s important to know about the common differences between the two solutions. Table 4.1 describes some common differences between fully managed and user-managed notebooks:

vertex-ai-workbench-your-complete-guide-to-scaling-machine-learning-with-google-cloud-img-2

Table 4.1 – Differences between managed and user-managed notebook instances Let’s create one user-managed notebook to check the available options:

vertex-ai-workbench-your-complete-guide-to-scaling-machine-learning-with-google-cloud-img-3

Figure 4.4 – Jupyter Notebook kernel configurations

As we can see in the preceding screenshot, user-managed notebook instances come with several customized image options to choose from. Along with the support of tools such as TensorFlow Enterprise, PyTorch, JAX, and so on, it also lets us decide whether we want to work with GPUs (which can be changed later, of course, as per needs). These customized images come with all useful libraries pre-installed for the desired framework, plus provide the flexibility to install any third-party packages within the instance.

After choosing the appropriate image, we get more options to customize things such as notebook name, notebook region, operating system, environment, machine types, accelerators, and so on (see the following screenshot):

vertex-ai-workbench-your-complete-guide-to-scaling-machine-learning-with-google-cloud-img-4

Figure 4.5 – Configuring a new user-managed Jupyter Notebook

Once we click on the CREATE button, it can take a couple of minutes to create a notebook instance. Once it is ready, we can launch the Jupyter instance in a browser tab using the link provided inside Workbench (see Figure 4.6). We also get the option to stop the notebook for some time when we are not using it (to reduce cost):

vertex-ai-workbench-your-complete-guide-to-scaling-machine-learning-with-google-cloud-img-5

Figure 4.6 – A running Jupyter Notebook instance

This Jupyter instance can be accessed by all team members having access to Workbench, which helps in collaborating and sharing progress with other teammates. Once we click on OPEN JUPYTERLAB, it opens a familiar Jupyter environment in a new tab (see Figure 4.7):

vertex-ai-workbench-your-complete-guide-to-scaling-machine-learning-with-google-cloud-img-6

Figure 4.7 – A user-managed JupyterLab instance in Vertex AI Workbench A Google-managed JupyterLab instance also looks very similar (see Figure 4.8):

vertex-ai-workbench-your-complete-guide-to-scaling-machine-learning-with-google-cloud-img-7

Figure 4.8 – A Google-managed JupyterLab instance in Vertex AI Workbench

Now that we can access the notebook instance in the browser, we can launch a new Jupyter Notebook or terminal and get started on the project. After providing sufficient permissions to the service account, many useful Google Cloud services such as BigQuery, GCS, Dataflow, and so on can be accessed from the Jupyter Notebook itself using SDKs. This makes Vertex AI Workbench a one-stop tool for every ML development need.

Note: We should stop Vertex AI Workbench instances when we are not using them or don’t plan to use them for a long period of time. This will help prevent us from incurring costs from running them unnecessarily for a long period of time.

In the next sections, we will learn how to create notebooks using custom containers and how to schedule notebooks with Vertex AI Workbench.

Custom containers for Vertex AI Workbench

Vertex AI Workbench gives us the flexibility of creating notebook instances based on a custom container as well. The main advantage of a custom container-based notebook is that it lets us customize the notebook environment based on our specific needs. Suppose we want to work with a new TensorFlow version (or any other library) that is currently not available as a predefined kernel. We can create a custom Docker container with the required version and launch a Workbench instance using this container. Custom containers are supported by both managed and user-managed notebooks.

Here is how to launch a user-managed notebook instance using a custom container:

1. The first step is to create a custom container based on the requirements. Most of the time, a derivative container (a container based on an existing DL container image) would be easy to set up. See the following example Dockerfile; here, we are first pulling an existing TensorFlow GPU image and then installing a new TensorFlow version from the source:

FROM gcr.io/deeplearning-platform-release/tf-gpu:latest



RUN pip install -y tensorflow

2. Next, build and push the container image to Container Registry, such that it should be accessible to the Google Compute Engine (GCE) service account. See the following source to build and push the container image:

export PROJECT=$(gcloud config list project --format     "value(core.project)")



docker build . -f Dockerfile.example -t "gcr.io/${PROJECT}/
tf-custom:latest"



docker push "gcr.io/${PROJECT}/tf-custom:latest"

Note that the service account should be provided with sufficient permissions to build and push the image to the container registry, and the respective APIs should be enabled.

3. Go to the User-managed notebooks page, click on the New Notebook button, and then select Customize. Provide a notebook name and select an appropriate Region and Zone value.

4. In the Environment field, select Custom Container.

5. In the Docker Container Image field, enter the address of the custom image; in our case, it would look like this:

gcr.io/${PROJECT}/tf-custom:latest

6. Make the remaining appropriate selections and click the Create button.

We are all set now. While launching the notebook, we can select the custom container as a kernel and start working on the custom environment.

Conclusion

Vertex AI Workbench stands out as a powerful, cloud-based environment that streamlines machine learning development and deployment. By leveraging its managed and user-managed notebook options, teams can overcome local development limitations, ensuring better scalability, enhanced security, and integrated access to Google Cloud services. This guide has explored the foundational aspects of working with Vertex AI Workbench, including its customizable environments, scheduling features, and the use of custom containers. With Vertex AI Workbench, data scientists and ML practitioners can focus on innovation and productivity, confidently handling projects from inception to production.

Author Bio

Jasmeet Bhatia is a machine learning solution architect with over 18 years of industry experience, with the last 10 years focused on global-scale data analytics and machine learning solutions. In his current role at Google, he works closely with key GCP enterprise customers to provide them guidance on how to best use Google's cutting-edge machine learning products. At Google, he has also worked as part of the Area 120 incubator on building innovative data products such as Demand Signals, and he has been involved in the launch of Google products such as Time Series Insights. Before Google, he worked in similar roles at Microsoft and Deloitte.
When not immersed in technology, he loves spending time with his wife and two daughters, reading books, watching movies, and exploring the scenic trails of southern California.
He holds a bachelor's degree in electronics engineering from Jamia Millia Islamia University in India and an MBA from the University of California Los Angeles (UCLA) Anderson School of Management.

Kartik Chaudhary is an AI enthusiast, educator, and ML professional with 6+ years of industry experience. He currently works as a senior AI engineer with Google to design and architect ML solutions for Google's strategic customers, leveraging core Google products, frameworks, and AI tools. He previously worked with UHG, as a data scientist, and helped in making the healthcare system work better for everyone. Kartik has filed nine patents at the intersection of AI and healthcare.
Kartik loves sharing knowledge and runs his own blog on AI, titled Drops of AI.
Away from work, he loves watching anime and movies and capturing the beauty of sunsets.