You're reading from The Machine Learning Solutions Architect Handbook Create machine learning platforms to run solutions in an enterprise setting

Product type Paperback

Published in Jan 2022

Publisher Packt

ISBN-13 9781801072168

Length 442 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Machine Learning

Author (1):

David Ping

View More author details

Table of Contents (17) Chapters

Preface

1. Section 1: Solving Business Challenges with Machine Learning Solution Architecture

2. Chapter 1: Machine Learning and Machine Learning Solutions Architecture FREE CHAPTER

3. Chapter 2: Business Use Cases for Machine Learning

4. Section 2: The Science, Tools, and Infrastructure Platform for Machine Learning

5. Chapter 3: Machine Learning Algorithms

6. Chapter 4: Data Management for Machine Learning

7. Chapter 5: Open Source Machine Learning Libraries

8. Chapter 6: Kubernetes Container Orchestration Infrastructure Management

9. Section 3: Technical Architecture Design and Regulatory Considerations for Enterprise ML Platforms

10. Chapter 7: Open Source Machine Learning Platforms

11. Chapter 8: Building a Data Science Environment Using AWS ML Services

12. Chapter 9: Building an Enterprise ML Architecture with AWS ML Services

13. Chapter 10: Advanced ML Engineering

14. Chapter 11: ML Governance, Bias, Explainability, and Privacy

15. Chapter 12: Building ML Solutions with AWS AI Services

16. Other Books You May Enjoy

Kubernetes overview and core concepts

While it is feasible to deploy and manage the life cycle of a small number of containers and containerized applications directly in a compute environment, it can get very challenging when you have a large number of containers to manage and orchestrate across a large number of servers. This is where Kubernetes comes in. Initially released in 2014, Kubernetes (K8s) is an open source system for managing containers at scale on clusters of servers (the abbreviation K8s is derived by replacing ubernete with the digit 8).

Architecturally, Kubernetes operates a master node and one or more worker nodes in a cluster of servers. The master node, also known as the control plane, is responsible  for the overall management of the cluster, and it has four key components:

API server
Scheduler
Controller
etcd

The master node exposes an API server layer that allows programmatic control of the cluster. An example of an API call could be the deployment of a web application on the cluster. The control plane also tracks and manages all configuration data in a key-value store called etcd that is responsible for storing all the cluster data, such as the desired number of container images to run, compute resource specification, and size of storage volume for a web application running on the cluster. Kubernetes uses an object type called controller to monitor the current states of Kubernetes resources and take the necessary actions (for example, request the change via the API server) to move the current states to the desired states if there are differences (such as the difference in the number of the running containers) between the two states. The controller manager in the master node is responsible for managing all the Kubernetes controllers. Kubernetes comes with a set of built-in controllers such as scheduler, which is responsible for scheduling Pods (units of deployment that we will discuss in more detail later) to run on worker nodes when there is a change request. Other examples include Job controller, which is responsible for running and stopping one or more Pods for a task, and Deployment controller, which is responsible for deploying Pods based on a deployment manifest, such as a deployment manifest for a web application. The following figure (Figure 6.2) shows the core architecture components of a Kubernetes cluster:

Figure 6.2 – Kubernetes architecture

To interact with a Kubernetes cluster control plane, you can use the kubectl command-line utility, the Kubernetes Python client (https://github.com/kubernetes-client/python), or access directly using the RESTful API. You can get a list of supported kubectl commands at https://kubernetes.io/docs/reference/kubectl/cheatsheet/.

There are a number of unique technical concepts that are core to the Kubernetes architecture. The following are some of the main concepts that Kubernetes operates around:

Namespaces: Namespaces organize clusters of worker machines into virtual sub-clusters. They are used to provide logical separation of resources owned by different teams and projects while still allowing ways for different namespaces to communicate. A namespace can span multiple worker nodes, and it can be used to group a list of permissions under a single name to allow authorized users to access resources in a namespace. Resource usage controls can be enforced to namespaces such as quotas for CPU and memory resources. Namespaces also make it possible to name resources with identical names if the resources reside in the different namespaces to avoid naming conflicts. By default, there is a default namespace in Kubernetes. You can create additional namespaces as needed. The default namespace is used if a namespace is not specified.
Pods: Kubernetes deploys computing in a logical unit called a Pod. All Pods must belong to a Kubernetes namespace (either the default namespace or a specified namespace). One or more containers can be grouped into a Pod, and all containers in the Pod are deployed and scaled together as a single unit and share the same context, such as Linux namespaces and filesystems. Each Pod has a unique IP address that's shared by all the containers in a Pod. A Pod is normally created as a workload resource, such as a Kubernetes Deployment or Kubernetes Job.

Figure 6.3 – Namespaces, Pods, and containers

The preceding figure (Figure 6.3) shows the relationship between namespaces, Pods, and containers in a Kubernetes cluster. In this figure, each namespace contains its own set of Pods and each Pod can contain one or more containers running in it.

Deployment: A deployment is used by Kubernetes to create or modify Pods that run containerized applications. For example, to deploy a containerized application, you create a configuration manifest file (usually in a YAML file format) that specifies details, such as the container deployment name, namespaces, container image URI, number of Pod replicas, and the communication port for the application. After the deployment is applied using a Kubernetes client utility (kubectl), the corresponding Pods running the specified container images will be created on the worker nodes. The following example creates a deployment of Pods for an Nginx server with the desired specification:

apiVersion: apps/v1  # k8s API version used for creating this deployment
kind: Deployment  # the type of object. In this case, it is deployment 
metadata:
  name: nginx-deployment  # name of the deployment
spec:
  selector:
    matchLabels:
      app: nginx  # an app label for the deployment.  This can be used to look up/select Pods
  replicas: 2  # tells deployment to run 2 Pods matching the template
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2  # Docker container image used for the deployment
        ports:
        - containerPort: 80  # the networking port to communicate with the containers

The following figure shows the flow of applying the preceding deployment manifest file to a Kubernetes cluster and creates two Pods to host two copies of the Nginx container:

Figure 6.4 – Creating an Nginx deployment

After the deployment, a Deployment controller monitors the deployed container instances. If an instance goes down, the controller will replace it with another instance on the worker node.

Kubernetes Job: A Kubernetes Job is a controller that creates one or more Pods to run some tasks, and ensures the job is successfully completed. If a number of Pods fail due to node failure or other system issues, a Kubernetes Job will recreate the Pods to complete the task. A Kubernetes Job can be used to run batch-oriented tasks, such as running batch data processing scripts, ML model training scripts, or ML batch inference scripts on a large number of inference requests. After a job is completed, the Pods are not terminated, so you can access the job logs and inspect the detailed status of the job. The following is an example template for running a training job:
```
apiVersion: batch/v1
kind: Job # indicate that his is the Kubernetes Job resource
metadata:
  name: train-job
spec:
  template:
    spec:
      containers:
      - name: train-container
        imagePullPolicy: Always # tell the job to always pulls a new container image when it is started
        image: <uri to Docker image containing training script>
        command: ["python3",  "train.py"]  # tell the container to run this command after it is started
      restartPolicy: Never
  backoffLimit: 0
```
Kubernetes custom resources (CRs) and operators: Kubernetes provides a list of built-in resources, such as Pods or deployment for different needs. It also allows you to create CRs and manage them just like the built-in resources, and you can use the same tools (such as kubectl) to manage them. When you create the custom resource (CR) in Kubernetes, Kubernetes creates a new API (for example, <custom resource name>/<version>) for each version of the resource. This is also known as extending the Kubernetes APIs. To create a CR, you create a custom resource definition (CRD) YAML file. To register the CRD in Kubernetes, you simply run kubectl apply -f <name of the CRD yaml file> to apply the file. And after that, you can use it just like any other Kubernetes resource. For example, to manage a custom model training job on Kubernetes, you can define a CRD with specifications such as algorithm name, data encryption setting, training image, input data sources, number of job failure retries, number of replicas, and job liveness probe frequency.

A Kubernetes operator is a controller that operates on a custom resource. The operator watches the CR types and takes specific actions to make the current state match the desired state, just like what a built-in controller does. For example, if you want to create a training job for the training job CRD mentioned previously, you create an operator that monitors training job requests and performs application-specific actions to start up the Pods and run the training job throughout the life cycle. The following figure (Figure 6.5) shows the components involved with an operator deployment:

Figure 6.5 – A Kubernetes custom resource and its interaction with the operator

The most common way to deploy an operator is to deploy a CR definition and the associated controller. The controller runs outside of the Kubernetes control plane, similar to running a containerized application in a Pod.

The rest of the chapter is locked

You're reading from The Machine Learning Solutions Architect Handbook Create machine learning platforms to run solutions in an enterprise setting

Table of Contents (17) Chapters

Kubernetes overview and core concepts

Authors (1)

Personalised recommendations for you

You're reading from The Machine Learning Solutions Architect Handbook Create machine learning platforms to run solutions in an enterprise setting

Table of Contents (17) Chapters

Kubernetes overview and core concepts

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you