Inferencing using Elastic Kubernetes Service
EKS is designed to provide Kubernetes clusters for application deployment by simplifying the complex cluster management process (https://aws.amazon.com/eks). The detailed steps for creating an EKS cluster can be found at https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html. In general, an EKS cluster is used to deploy any web service application and scale it as necessary. The inference endpoint on EKS is just a web service application that handles model inference requests. In this section, you will learn how to host a DL model inference endpoint on EKS.
A Kubernetes cluster has a control plane and a set of nodes. The control plane makes scheduling and scaling decisions based on the volume of incoming traffic. With scheduling, the control plane manages which node runs a job at a given point in time. With scaling, the control plane increases or decreases the size of the pod based on the volume of traffic coming into the...