Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Kubernetes in Production Best Practices
Kubernetes in Production Best Practices

Kubernetes in Production Best Practices: Build and manage highly available production-ready Kubernetes clusters

eBook
$9.99 $26.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Kubernetes in Production Best Practices

Chapter 2: Architecting Production-Grade Kubernetes Infrastructure

In the previous chapter, you learned about the core components of Kubernetes and the basics of its infrastructure, and why putting Kubernetes in production is a challenging journey. We introduced the production-readiness characteristics for the Kubernetes clusters, along with our recommended checklist for the services and configurations that ensure the production-readiness of your clusters.

We also introduced a group of infrastructure design principles that we learned through building production-grade cloud environments. We use them as our guideline through this book whenever we make architectural and design decisions, and we highly recommend that cloud infrastructure teams consider these when it comes to architecting new infrastructure for Kubernetes and cloud platforms in general.

In this chapter, you will learn about the important architectural decisions that you will need to tackle while designing your Kubernetes infrastructure. We will explore the alternatives and the choices that you have for each of these decisions, along with the possible benefits and drawbacks. In addition to that, you will learn about the cloud architecture considerations, such as scaling, availability, security, and cost. We do not intend to make final decisions but provide the guidance because every organization has different needs and use cases. Our role is to explore them, and guide you through the decision-making process. When possible, we will state our preferred choices, which we will follow through this book for the practical exercises.

In this chapter, we will cover the following topics:

  • Understanding Kubernetes infrastructure design considerations
  • Exploring Kubernetes deployment strategy alternatives
  • Designing an Amazon EKS infrastructure

Understanding Kubernetes infrastructure design considerations

When it comes to Kubernetes infrastructure design, there are a few, albeit important, considerations to take into account. Almost every cloud infrastructure architecture shares the same set of considerations; however, we will discuss these considerations from a Kubernetes perspective, and shed some light on them.

Scaling and elasticity

Public cloud infrastructure, such as AWS, Azure, and GCP, introduced scaling and elasticity capabilities at unprecedented levels. Kubernetes and containerization technologies arrived to build upon these capabilities and extend them further.

When you design a Kubernetes cluster infrastructure, you should ensure that your architecture covers the following two areas:

  • Scalable Kubernetes infrastructure
  • Scalable workloads deployed to the Kubernetes clusters

To achieve the first requirement, there are parts that depend on the underlying infrastructure, either public cloud or on-premises, and other parts that depend on the Kubernetes cluster itself.

The first part is usually solved when you choose to use a managed Kubernetes service such as EKS, AKS, or GKE, as the cluster's control plane and worker nodes will be scalable and supported by other layers of scalable infrastructure.

However, in some use cases, you may need to deploy a self-managed Kubernetes cluster, either on-premises or in the cloud, and in this case, you need to consider how to support scaling and elasticity to enable your Kubernetes clusters to operate at their full capacity.

In all public cloud infrastructure, there is the concept of compute auto scaling groups, and Kubernetes clusters are built on them. However, because of the nature of the workloads running on Kubernetes, scaling needs should be synchronized with the cluster scheduling actions. This is where Kubernetes cluster autoscaler comes to our aid.

Cluster autoscaler (CAS) is a Kubernetes cluster add-on that you optionally deploy to your cluster, and it automatically scales up and down the size of worker nodes based on the set of conditions and configurations that you specify in the CAS. Basically, it triggers cluster upscaling when there is a pod that cannot schedule due to insufficient compute resources, or it triggers cluster downscaling when there are underutilized nodes, and their pods can be rescheduled and placed in other nodes. You should take into consideration the time a cloud provider takes to execute the launch of a new node, as this could be a problem for time-sensitive apps, and in this case, you may consider CAS configuration that enables node over provisioning.

For more information about CAS, refer to the following link: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler.

To achieve the second scaling requirement, Kubernetes provides two solutions to achieve autoscaling of the pods:

  • Horizontal Pod Autoscaler (HPA): This works similar to cloud autoscaling groups, but at a pod deployment level. Think of the pod as the VM instance. HPA scales the number of pods based on a specific metrics threshold. This can be CPU or memory utilization metrics, or you can define a custom metric. To understand how HPA works, you can continue reading about it here: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.
  • Vertical Pod Autoscaler (VPA): This scales the pod vertically by increasing its CPU and memory limits according to the pod usage metrics. Think of VPA as upscaling/downscaling the VM instance by changing its type in the public cloud. VPA can affect CAS and triggers upscaling events, so you should revise the CAS and VPA configurations to get them aligned and avoid any unpredictable scaling behavior. To understand how VPA works, you can continue reading about it here: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler.

We highly recommend using HPA and VPA for your production deployments (it is not essential for non-production environments). We will give examples on how to use both of them in deploying production-grade apps and services in Chapter 8, Deploying Seamless and Reliable Applications.

High availability and reliability

Uptime means reliability and is usually the top metric that the infrastructure teams measure and target for enhancement. Uptime drives the service-level objectives (SLOs) for services, and the service level agreements (SLAs) with customers, and it also indicates how stable and reliable your systems and Software as a Service (SaaS) products are. High availability is the key for increasing uptime, and when it comes to Kubernetes clusters' infrastructure, the same rules still apply. This is why designing a highly available cluster and workload is an essential requirement for a production-grade Kubernetes cluster.

You can architect a highly available Kubernetes infrastructure on different levels of availability as follows:

  • A cluster in a single public cloud zone (single data center): This is considered the easiest architecture among the others, but it brings the highest risk. We do not recommend this solution.
  • A cluster in multiple zones (multiple data centers) but in a single cloud region: This is still easy to implement, it provides a higher level of availability, and it is a common architecture for Kubernetes clusters. However, when your cloud provider has a full region outage, your cluster will be entirely unavailable. Such full region outages rarely happen, but you still need to be prepared for such a scenario.
  • Across multi-region clusters, but within the same cloud provider: In this architecture, you usually run multiple federated Kubernetes clusters to serve your production workloads. This is usually the preferred solution for high availability, but it comes at a cost that makes it hard to implement and operate, especially the possible poor network performance, and shared storage for stateful applications. We do not recommend this architecture since, for the majority of SaaS products, it is enough to deploy Kubernetes in a single region and multiple zones. However, if you have a multi-region as a requirement for a reason other than high availability, you may consider multi-region Kubernetes federated clusters as a solution.
  • Multiple clusters across multi-cloud deployment: This architecture is still unpopular due to the incompatibility limitations across cloud providers, inter-cluster network complexity, and the higher cost associated with network traffic across providers, along with implementation and operations. However, it is worth mentioning the increase in the number of multi-cloud management solutions that are endeavoring to tackle and solve these challenges, and you may wish to consider a multi-cluster management solution such as Anthos from Google. You can learn more about it here: https://cloud.google.com/anthos.

As you may notice, Kubernetes has different architectural flavors when it comes to high availability setup, and I can say that having different choices makes Kubernetes more powerful for different use cases. Although the second choice is the most common one as of now, as it strikes a balance between the ease of implementation and operation, and the high availability level. We are optimistically searching for a time when we can reach the fourth level, where we can easily deploy Kubernetes clusters across cloud providers and gain all the high availability benefits without the burden of tough operations and increased costs.

As for the cluster availability itself, I believe it goes without saying that Kubernetes components should run in a highly available mode, that is, having three or more nodes for a control plane, or preferably letting the cloud manage the control plane for you, as in EKS, AKE, or GKE. As for workers, you have to run one or more autoscaling groups or node groups/pools, and this ensures high availability.

The other area where you need to consider achieving high availability is for the pods and workloads that you will deploy to your cluster. Although this is beyond the scope of this book, it is still worthwhile mentioning that developing new applications and services, or modernizing your existing ones so that they can run in a high availability mode, is the only way to make use of the raft of capabilities provided by the powerful Kubernetes infrastructure underneath it. Otherwise, you will end up with a very powerful cluster but with monolithic apps that can only run as a single instance!

Security and compliance

Kubernetes infrastructure security is rooted at all levels of your cluster, starting from the network layer, going through the OS level, up to cluster services and workloads. Luckily, Kubernetes has strong support for security, encryption, authentication, and authorization. We will learn about security in Chapter 6, Securing Kubernetes Effectively, of this book. However, during the design of the cluster infrastructure, you should give attention to important decisions relating to security, such as securing the Kubernetes API server endpoint, as well as the cluster network design, security groups, firewalls, network policies between the control plane components, workers nodes, and the public internet.

You will also need to plan ahead in terms of the infrastructure components or integrations between your cluster and identity management providers. This usually depends on your organization's security policies, which you need to align with your IT and security teams.

Another aspect to consider is the auditing and compliance of your cluster. Most organizations have cloud governance policies and compliance requirements, which you need to be aware of before you proceed with deploying your production on Kubernetes.

If you decide to use a multi-tenant cluster, the security requirements could be more challenging, and setting clear boundaries among the cluster tenants, as well as cluster users from different internal teams, may result in decisions such as deploying a service mesh, hardening cluster network policies, and implementing a tougher Role-Based Access Control (RBAC) mechanism. All of this will impact your decisions while architecting the infrastructure of your first production cluster.

The Kubernetes community is keen on compliance and quality, and for that there are multiple tools and tests to ensure that your cluster achieves an acceptable level of security and compliance. We will learn about these tools and tests in Chapter 6, Securing Kubernetes Effectively.

Cost management and optimization

Cloud cost management is an important factor for all organizations adopting cloud technology, both for those just starting and those who are already in the cloud. Adding Kubernetes to your cloud infrastructure is expected to bring cost savings, as containerization enables you to highly utilize your computer resources on a scale that was not possible with VMs ever before. Some organizations achieved cost savings up to 90% after moving to containers and Kubernetes.

However, without proper cost control, costs can rise again, and you end up with a lot of wasted infrastructure cost with uncontrolled Kubernetes clusters. There are many tools and best practices to consider in relation to cost management, but we mainly want to focus on the actions and the technical decisions that you need to consider during infrastructure design.

We believe that there are two important aspects that require decisions, and these decisions will definitely affect your cluster infrastructure architecture:

  • Running a single, but multi-tenant, cluster versus multi clusters (that is, a single cluster per tenant)
  • The cluster capacity: whether to run few large worker nodes or a lot of small workers nodes, or a mix of the two

There are no definitive correct decisions, but we will try to explore the choices in the next section, and how we can reach a decision.

These are other considerations to be made regarding cost optimization where an early decision can be made:

  • Using spot/preemptible instances: This has proven to achieve huge cost savings; however, it comes at a price! There is the threat of losing your workloads at any time, which affects your product uptime and reliability. Options are available for overcoming this, such as using spot instances for non-production workloads, such as development environments or CI/CD pipelines, or any production workloads that can survive a disruption, such as data batch processing.

    We highly recommend using spot instances for worker nodes, and you can run them in their node group/pool and assign to them the types of workloads where you are not concerned with them being disrupted.

  • Kubernetes cost observability: Most cloud platforms provide cost visibility and analytics for all cloud resources. However, having cost visibility at the deployment/service level of the cluster is essential, and this needs to be planned ahead, so you use isolated workloads, teams, users, environments, and also using namespaces and assign resource quotas to them. By doing that, you will ensure that using a cost reporting tool will provide you with reports relating the usage to the service or cluster operations. This is essential for further decision making regarding cost reductions.
  • Kubernetes cluster management: When you run a single-tenant cluster, or one cluster per environment for development, you usually end up with tons of clusters sprawled across your account which could lead to increased cloud cost. The solution to this situation is to set up a cluster management solution from day one. This solution could be as simple as a cluster auto scaler script that reduces the worker nodes during periods of inactivity, or it can be a full automation with dashboards and a master cluster to manage the rest of clusters.

In Chapter 9, Monitoring, Logging, and Observability, and Chapter 10, Operating and Maintaining Efficient Kubernetes Clusters, we will learn about cost observability and cluster operations.

Manageability and operational efficiency

Usually, when an organization starts building a Kubernetes infrastructure, they invest most of their time, effort, and focus in urgent and critical demands for infrastructure design and deployment, which we usually call Day 0 and Day 1. It is unlikely that an organization will devote its attention to operational and manageability concerns that we will face in the future (Day 2).

This is justified by the lack of experience in Kubernetes, and the types of operational challenges, or by being driven by gaining the benefits of Kubernetes that mainly relate to development, such as increasing a developer's productivity and agility, and automating releases and deployment.

All of this leads to organizations and teams being less prepared for Day 2. In this book, we try to maintain a balance between design, implementation, and operations, and shed some light on the important aspects of the operation and learn how to plan for it from Day 0, especially in relation to reliability, availability, security, and observability.

Operational challenges with Kubernetes

These are the common operational and manageability challenges that most teams face after deploying Kubernetes in production. This is where you need to rethink and consider solutions beforehand in order to handle these challenges properly:

  • Reliability and scaling: When your infrastructure scales up, you could end up with tens or hundreds of clusters, or clusters with hundreds or thousands of nodes, and tons of configurations for different environment types. This makes it harder to manage the SLAs/SLOs of your applications, as well as the uptime goals, and even diagnosing a cluster issue could be very problematic. Teams need to develop their Kubernetes knowledge and troubleshooting skills.
  • Observability: No doubt Kubernetes is complex, and this makes monitoring and logging a must-have service once your cluster is serving production, otherwise you will have a very tough time identifying issues and problems. Deploying monitoring and logging tools, in addition to defining the basic observability metrics and thresholds, are what you need to take care of in this regard. 
  • Updateability and cluster management: Updating Kubernetes components, such as the API server, kubelet, etcd, kube-proxy, Docker images, and configuration for the cluster add-ons, become challenging to manage during the cluster life cycle. This requires the correct tools to be in place from the outset. Automation and IaC tools, such as Terraform, Ansible, and Helm, are commonly used to help in this regard.
  • Disaster recovery: What happens when you have a partial or complete cluster failure? What is the recovery plan? How do you mitigate this risk and decrease the mean time to recover your clusters and workloads. This requires deployment of the correct tools, and writing the playbooks for backups, recovery, and crisis management.
  • Security and governance: You need to ensure that security best practices and governance policies are applied and enforced in relation to production clusters and workloads. This becomes challenging due to the complex nature of Kubernetes and its soft isolation techniques, its agility, and the rapid pace it brings to the development and release life cycles.

There are other operational challenges. However, we found that most of these can be mitigated if we stick to the following infrastructure best practices and standards:

  • Infrastructure as Code (IaC): This is the default practice for modern infrastructure and DevOps teams. It is also a recommended approach to use declarative IaC tools and technologies over their imperative counterparts.
  • Automation: We live in the age of software automation, as we tend to automate everything; it is more efficient and easier to manage and scale, but we need to take automation with Kubernetes to another level. Kubernetes comes with the ability to automate the life cycle of containers, and it also comes with advanced automation concepts, such as operators and GitOps, which are efficient and can literally automate automations.
  • Standardization: Having a set of standards helps to reduce teams' struggles with aligning and working together, eases the scaling of the processes, improves the overall quality, and increases productivity. This becomes essential for companies and teams that are planning to use Kubernetes in production, as this involves integrating with different infrastructure parts, migrating services from on-premises to the cloud, and many further complexities.

    Defining your set of standards covers processes for operation runbooks and playbooks, as well as technology standardization – using Docker, Kubernetes, and standard tools across teams. These tools should have specific characteristics: open source but battle-tested in production, the ability to support the other principles, such as IaC code, immutability, being cloud-agnostic, and being simple to use and deploy with a minimum of infrastructure.

  • Single source of truth: Having a source of truth is a cornerstone and enabler to modern infrastructure management and configuration. Source code control systems such as Git are becoming the standard choice to store and version infrastructure code, where having a single and dedicated source code repository for infrastructure is the recommended practice to follow.

Managing Kubernetes infrastructure is about management complexity. Hence, having a solid infrastructure design, applying best practices and standards, increasing the team's Kubernetes-specific skills, and expertise will all result in a smooth operational and manageability journey.

Exploring Kubernetes deployment strategy alternatives

Kubernetes and its ecosystem come with vast choices for everything you can do related to deploying, orchestrating, and operating your workloads. This flexibility is a huge advantage, and enables Kubernetes to suit different use cases, from regular applications on-premises and in the cloud to IoT and edge computing. However, choices come with responsibility, and in this chapter, we learn about the technical decisions that you need to evaluate and take regarding your cluster deployment architecture..

One of the important questions to ask and a decision to make is where to deploy your clusters, and how many of them you may need in order to run your containerized workloads? The answer is usually driven by both business and technical factors; elements such as the existing infrastructure, cloud transformation plan, cloud budget, the team size, and business growth target. All of these aspects could affect this, and this is why the owner of the Kubernetes initiative has to collaborate with organization teams and executives to reach a common understanding of the decision drivers, and agree on the right direction for their business.

We are going to explore some of the common Kubernetes deployment architecture alternatives, with their use cases, benefits, and drawbacks:

  • Multi-availability-zones clusters: This is the mainstream architecture for deploying a high availability (HA) cluster in a public cloud. Because running clusters in a multi-availability zones is usually supported by all public cloud providers, and, at the same time, it achieves an acceptable level of HA. This drives the majority of new users of Kubernetes to opt for this choice. However, if you have essential requirements to run your workloads in different regions, this option will not be helpful.
  • Multi-region clusters: Unless you have a requirement to run your clusters in multiple regions, there is little motivation to opt for it. While a public cloud provider to lose an entire region is a rare thing, but if you have the budget to do a proper design and overcome the operational challenges, then you can opt for a multi-region setup. It will definitely provide you with enhanced HA and reliability levels.
  • Hybrid cloud clusters: A hybrid cloud is common practice for an organization migrating from on-premise to the public cloud and that is going through a transitional period where they have workloads or data split between their old infrastructure and the new cloud infrastructure. Hybrid could also be a permanent setup, where an organization wants to keep part of its infrastructure on-premise either for security reasons (think about sensitive data), or due to the impossibility of migrating to the cloud. Kubernetes is an enabler of the hybrid cloud model, especially with managed cluster management solutions such as Google Anthos. This nevertheless entails higher costs in terms of provision and operation.
  • Multi-cloud clusters: Unlike hybrid cloud clusters, I find multi-cloud clusters to be an uncommon pattern, as it usually lacks the strong drivers behind it. You can run multiple different systems in multi-cloud clusters for a variety of reasons, but deploying a single system across two or more clouds over Kubernetes is not common, and you should be cautious before moving in this direction. However, I can understand the motivating factors behind some organizations doing this, such as avoiding cloud lock-in with a particular provider, leveraging pricing models with different providers for cost optimization, minimizing latency, or even achieving ultimate reliability for the workloads.
  • On-premises clusters: If an organization decides not to move to the cloud, Kubernetes still can manage their infrastructure on-premises, and actually, Kubernetes is a reasonable choice to manage the on-prem workload in a modern fashion, however, the solid on-prem managed Kubernetes solutions still very few.
  • Edge clusters: Kubernetes is gaining traction in edge computing and the IoT world. It provides an abstraction to the underlying hardware, it is ideal for distributed computing needs, and the massive Kubernetes ecosystem helps to come out with multiple open source and third-party projects that fit edge computing nature, such as KubeEdge and K3s.
  • Local clusters: You can run Kubernetes on your local machine using tools such as Minikube or Kind (Kubernetes in Docker). The purpose of using a local cluster is for trials, learning, and for use by developers.

We have discussed the various clusters deployments architectures and models available and their use cases. In the next section, we will learn work on designing the Kubernetes infrastructure that we will use in this book, and the technical decisions around it..

Designing an Amazon EKS infrastructure

In this chapter, we have discussed and explored various aspects of Kubernetes clusters design, and the different architectural considerations that you need to take into account. Now, we need to put things together for the design that we will follow during this book. The decisions that we will make here do not mean that they are the only right ones, but this is the preferred design that we will follow in terms of having minimally acceptable production clusters for this book's practical exercise. You can definitely use the same design, but with modifications, such as cluster sizing.

In the following sections, we will explore our choices regarding the cloud provider, provisioning and configuration tools, and the overall infrastructure architecture, and in the chapters to follow, we will build upon these choices and use them to provision production-like clusters as well as deploy the configuration and services above the cluster.

Choosing the infrastructure provider

As we learned in the previous sections, there are different ways in which to deploy Kubernetes. You can deploy it locally, on-premises, or in a public cloud, private cloud, hybrid, multi-cloud, or an edge location. Each of these infrastructure type has use cases, benefits, and drawbacks. However, the most common one is the public cloud, followed by the hybrid model. The remaining choices are still limited to specific use cases.

In a single book like ours, we cannot discuss each of these infrastructure platforms, so we decided to go with the common choice for deploying Kubernetes, by using one of the public clouds (AWS, Azure, or GCP). You still can use another cloud provider, a private cloud, or even an on-premises setup, and most of the concepts and best practices discussed in this book are still applicable.

When it comes to choosing one of the public clouds, we do not advocate one over the others, and we definitely recommend using the cloud provider that you already use for your existing infrastructure, but if you are just embarking on your cloud journey, we advise you to perform a deeper benchmarking analysis between the public clouds to see which one is better for your business.

In the practical exercises in this book, we will use AWS and the Elastic Kubernetes Service (EKS). We explained in the previous chapter regarding the infrastructure design principle that we always prefer a managed service over its self-managed counterpart, and this applies here when it comes to choosing between EKS and building our self-managed clusters over AWS.

Choosing the cluster and node size

When you plan for your cluster, you need to decide both the cluster and node sizes. This decision should be based on the estimated utilization of your workloads, which you may know beforehand based on your old infrastructure, or it can be calculated approximately and then adjusted after going live in production. In either case, you will need to decide on the initial cluster and node sizes, and then keep adjusting them until you reach the correct utilization level to achieve a balance between cost and reliability. You can target a utilization level of between 70 and 80% unless you have a solid justification for using a different level.

These are the common cluster and node size choices that you can consider either individually or in a combination:

  • Few large clusters: In this setup, you deploy a few large clusters. These can be production and non-production clusters. A cluster could be large in terms of node size, node numbers, or both. Large clusters are usually easier to manage because they are few in number. They are cost efficient because you achieve higher utilization per node and cluster (assuming you are running the correct amount of workloads), and this improved utilization comes from saving the resources required for system management. On the downside, large clusters lack hard isolation for multi-tenants, as you only use namespaces for soft isolation between tenants. They also introduce a single point of failure to your production (especially when you run a single cluster). There is another limitation, as any Kubernetes cluster has an upper limit of 5,000 nodes that it can manage and when you have a single cluster, you can hit this upper limit if you are running a large number of pods.
  • Many small clusters: In this setup, you deploy a lot of small clusters. These could be small in terms of node size, node numbers, or both. Small clusters are good when it comes to security as they provide hard isolation between resources and tenants and also provide strong access control for organizations with multiple teams and departments. They also reduce the blast radius of failures and avoid having a single point of failure. On the downside, small clusters come with an operational overhead, as you need to manage a fleet of clusters. They are also inefficient in terms of resource usage, as you cannot achieve the utilization levels that you can achieve with large clusters, in addition to increasing costs, as they require more control plane resources to manage a fleet of small clusters that manage the same total number of worker nodes in a large cluster.
  • Large nodes: This is about the size of the nodes in a cluster. When you deploy large nodes in your cluster, you will have better and higher utilization of the node (assuming you deploy workloads that utilize 70-80% of the node). This is because a large node can handle application spikes, and it can handle applications with high CPU/memory requirements. In addition to that, a well utilized large node usually entails cost savings as it reduces the overall cluster resources required for system management and you can purchase such nodes at discounted prices from your cloud provider. On the downside, large nodes can introduce a high blast radius of failures, thereby affecting the reliability of both the cluster and apps. Also, adding a new large node to the cluster during an upscaling event will add a lot of cost that you may not need, so if your cluster is hit by variable scaling events over a short period, large nodes will be the wrong choice. Added to this is the fact that Kubernetes has an upper limit in terms of the number of pods that can run on a single node regardless of its type and size, and for a large node, this limitation could lead to underutilization.
  • Small nodes: This is about the size of the nodes per single cluster. When you deploy small nodes in your cluster, you can reduce the blast radius during failures, and also reduce costs during upscaling events. On the downside, small nodes are underutilized, they cannot handle applications with high resource requirements, and the total amount of system resources required to manage these nodes (kubelet, etcd, kube-proxy, and so on) is higher than managing the same compute power for a larger node, in addition to which small nodes have a lower limit for pods per node.
  • Centralized versus decentralized clusters: Organizations usually use one of these approaches in managing their Kubernetes clusters.

    In a decentralized approach, the teams or individuals within an organization are allowed to create and manage their own Kubernetes clusters. This approach provides flexibility for the teams to get the best out of their clusters, and customize them to fit their use cases; on the other hand, this increases the operational overhead, cloud cost, and makes it difficult to enforce standardization, security, best practices, and tools across the clusters. This approach is more appropriate for organizations that are highly decentralized, or when they are going through cloud transformation, product life cycle transitional periods, or exploring and innovating new technologies and solutions.

    In a centralized approach, the teams or individuals share a single cluster or small group of identical clusters that use a similar set of standards, configurations, and services. This approach overcomes and decreases the drawbacks in the decentralized model; however, it can be inflexible, slow down the cloud transformations, and decreases teams' agility. This approach is more suitable for organizations working towards maturity, platform stability, increasing cloud cost reduction, enforcing and promoting standards and best practices, and focusing on products rather than the underlaying platform.

Some organizations can run a hybrid models from the aforementioned alternatives, such as having large, medium, and small nodes to get the best of each type according to their apps needs. However, we recommend that you run experiments to decide which model suits your workload's performance, and meets your cloud cost reduction goal.

Choosing tools for cluster deployment and management

In the early days of Kubernetes, we used to deploy it from scratch, which was commonly called Kubernetes the Hard Way. Fast forward and the Kubernetes community got bigger and a lot of tools emerged to automate the deployment. These tools range from simple automation to complete one-click deployment.

In the context of this book, we are not going to explain each of these tools in the market (there are a lot), nor to compare and benchmark them. However, we will propose our choices with a brief reasoning behind the choices.

Infrastructure provisioning

When you deploy Kubernetes for the first time, most likely you will use a command-line tool with a single command to provision the cluster, or you may use a cloud provider web console to do that. In both ways, this approach is suitable for experimental and learning purposes, but when it comes to real implementation across production and development environments a provisioning tool becomes a must.

The majority of organizations that consider deploying Kubernetes already have an existing cloud infrastructure or they are going through a cloud migration process. This makes Kubernetes not the only piece of the cloud infrastructure that they will use. This is why we prefer a provisioning tool that achieves the following:

  • It can be used to provision Kubernetes as well as other pieces of infrastructure (databases, file stores, API gateways, serverless, monitoring, logging, and so on).
  • It fulfills and empowers the IaC principles.
  • It is a cloud-agnostic tool.
  • It has been battle-tested in production by other companies and teams.
  • It has community support and active development.

We can find these characteristics in Terraform, and this is why we chose to use it in the production clusters that we managed, as well as in this practical exercise in this book. We highly recommend Terraform for you as well, but if you prefer another portioning tool, you can skip this chapter and then continue reading this book and apply the same concepts and best practices.

Configuration management

Kubernetes configuration is declarative by nature, so, after deploying a cluster, we need to manage its configuration. The add-ons deployed provide services for various areas of functionality, including networking, security, monitoring, and logging. This is why a solid and versatile configuration management tool is required in your toolset.

The following are solid choices:

  • Regular configuration management tools, such as Ansible, Chef, and Puppet
  • Kubernetes-specific tools, such as Helm and Kustomize
  • Terraform

Our preferred order of suitable tools is as follows:

  1. Ansible
  2. Helm
  3. Terraform

We can debate this order, and we believe that any of these tools can fulfill the configuration management needs for Kubernetes clusters. However, we prefer to use Ansible for its versatility and flexibility as it can be used for Kubernetes and also for other configuration management needs for your environment, which makes it preferable over Helm. On the other hand, Ansible is preferred over Terraform because it is a provisioning tool at heart, and while it can handle configuration management, it is not the best tool for that.

In the hands-on exercises in this book, we decided to use Ansible with Kubernetes module and Jinja2 templates.

Deciding the cluster architecture

Each organization has its own way of managing cloud accounts. However, we recommend having at least two AWS accounts, one for production and another for non-production. The production Kubernetes cluster resides in the production account, and the non-production Kubernetes cluster resides in the non-production account. This structure is preferred for security, reliability, and operational efficiency.

Based on the technical decisions and choices that we made in the previous sections, we propose the following AWS architecture for the Kubernetes clusters that we will use in this book, which you can also use to deploy your own production and non-production clusters:

Figure 2.1 – Cluster architecture diagram

Figure 2.1 – Cluster architecture diagram

In the previous architecture diagram, we decided to do the following:

  • Create a separate VPC for the cluster network; we chose the Classless Inter-Domain Routing (CIDR) range, which has sufficient IPv4 addressing capacity for future scaling. Each Kubernetes node, pod, and service will have its own IP address, and we should keep in mind that the number of services will increase.
  • Create public and private subnets. The publicly accessible resources, such as load balancers and bastions, are placed in the public subnets, and the privately accessible resources, such as Kubernetes nodes, databases, and caches, are placed in the private subnets.
  • For high availability, we create the resources in three different availability zones. We placed one private and one public subnet in each availability zone.
  • For scaling, we run multiple EKS node groups.

We will discuss the details of these design specs in the next chapters, in addition to the remainder of the technical aspects of the cluster's architecture.

Summary

Provisioning a Kubernetes cluster can be a task that takes 5 minutes with modern tools and managed cloud services; however, thus this is far from a production-grade Kubernetes infrastructure and it is only sufficient for education and trials. Building a production-grade Kubernetes cluster requires hard work in designing and architecting the underlying infrastructure, the cluster, and the core services running above it.

By now, you have learned about the different aspects and challenges you have to consider while designing, building, and operating your Kubernetes clusters. We explored the different architecture alternatives to deploy Kubernetes clusters, and the important technical decisions associated with this process. Then, we discussed the proposed cluster design, which we will use during the book for the practical exercises, and we highlighted our selection of infrastructure platform, tools, and architecture.

In the next chapter, we will see how to put everything together and use the design concepts we discussed in this chapter to write IaC and follow industry best practices with Terraform to provision our first Kubernetes cluster.

Further reading

For more information on the topics covered in this chapter, please refer to the following links:

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Implement industry best practices to build and manage production-grade Kubernetes infrastructure
  • Learn how to architect scalable Kubernetes clusters, harden container security, and fine-tune resource management
  • Understand, manage, and operate complex business workloads confidently

Description

Although out-of-the-box solutions can help you to get a cluster up and running quickly, running a Kubernetes cluster that is optimized for production workloads is a challenge, especially for users with basic or intermediate knowledge. With detailed coverage of cloud industry standards and best practices for achieving scalability, availability, operational excellence, and cost optimization, this Kubernetes book is a blueprint for managing applications and services in production. You'll discover the most common way to deploy and operate Kubernetes clusters, which is to use a public cloud-managed service from AWS, Azure, or Google Cloud Platform (GCP). This book explores Amazon Elastic Kubernetes Service (Amazon EKS), the AWS-managed version of Kubernetes, for working through practical exercises. As you get to grips with implementation details specific to AWS and EKS, you'll understand the design concepts, implementation best practices, and configuration applicable to other cloud-managed services. Throughout the book, you’ll also discover standard and cloud-agnostic tools, such as Terraform and Ansible, for provisioning and configuring infrastructure. By the end of this book, you’ll be able to leverage Kubernetes to operate and manage your production environments confidently.

Who is this book for?

This book is for cloud infrastructure experts, DevOps engineers, site reliability engineers, and engineering managers looking to design and operate Kubernetes infrastructure for production. Basic knowledge of Kubernetes, Terraform, Ansible, Linux, and AWS is needed to get the most out of this book.

What you will learn

  • Explore different infrastructure architectures for Kubernetes deployment
  • Implement optimal open source and commercial storage management solutions
  • Apply best practices for provisioning and configuring Kubernetes clusters, including infrastructure as code (IaC) and configuration as code (CAC)
  • Configure the cluster networking plugin and core networking components to get the best out of them
  • Secure your Kubernetes environment using the latest tools and best practices
  • Deploy core observability stacks, such as monitoring and logging, to fine-tune your infrastructure
Estimated delivery fee Deliver to Malaysia

Standard delivery 10 - 13 business days

$8.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Mar 12, 2021
Length: 292 pages
Edition : 1st
Language : English
ISBN-13 : 9781800202450
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Malaysia

Standard delivery 10 - 13 business days

$8.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Publication date : Mar 12, 2021
Length: 292 pages
Edition : 1st
Language : English
ISBN-13 : 9781800202450
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $29.97 $134.97 $105.00 saved
Mastering Kubernetes
$79.99
Kubernetes and Docker - An Enterprise Guide
$54.99
Kubernetes in Production Best Practices
$38.99
Total $29.97$134.97 $105.00 saved Stars icon
Banner background image

Table of Contents

11 Chapters
Chapter 1: Introduction to Kubernetes Infrastructure and Production-Readiness Chevron down icon Chevron up icon
Chapter 2: Architecting Production-Grade Kubernetes Infrastructure Chevron down icon Chevron up icon
Chapter 3: Provisioning Kubernetes Clusters Using AWS and Terraform Chevron down icon Chevron up icon
Chapter 4: Managing Cluster Configuration with Ansible Chevron down icon Chevron up icon
Chapter 5: Configuring and Enhancing Kubernetes Networking Services Chevron down icon Chevron up icon
Chapter 6: Securing Kubernetes Effectively Chevron down icon Chevron up icon
Chapter 7: Managing Storage and Stateful Applications Chevron down icon Chevron up icon
Chapter 8: Deploying Seamless and Reliable Applications Chevron down icon Chevron up icon
Chapter 9: Monitoring, Logging, and Observability Chevron down icon Chevron up icon
Chapter 10: Operating and Maintaining Efficient Kubernetes Clusters Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(9 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Marcos Mar 16, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
That is for sure one of the best technical books I've read. Pretty good explanation and full of details and code. You can surely become a kubernetes specialist. This book is a masterpiece.About the book, I liked chapter 2 because it has a good foundation for infrastructure architecture and design. The chapter 2 and 3 about IaC with terraform and ansible, and by end of the book chapter 10 is very helpful for troubleshooting.
Amazon Verified review Amazon
Walter Lee Mar 28, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Chapters 1 and 2 show useful Kubernetes production readiness checklist, cluster and applications design considerations, Kubernetes deployment strategy. Good infrastructure as code demo with Terraform and Ansible playbooks on AWS EKS throughout this book. Chapters 3-10 explain and demo many important topics, e.g. cluster, networking, storage, security, logging, monitoring, operations, backup, upgrades, cost optimization, ... etc. Very useful for running a Production Kubernetes cluster effectively !
Amazon Verified review Amazon
Mustapha Saeed Mar 13, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I liked the book focus on the best practices that are commonly used by Kubernetes production environments, and it provided a nice roadmap to create and manage your k8s production infrastructure with solid design and architectural design ideas. It also gives great attention to IaC with terraform, CaC with ansible, and nicely covers security and networking.
Amazon Verified review Amazon
tgonzaga Mar 25, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Kubernetes is a very complex ecosystem with a variety of use cases all over, it’s hard to find reliable resources on best practices, but this book is a silver bullet, it has good practices for all tastes, small to large production environments. It’s definitely a must-read for DevOps guys and Developers in general.
Amazon Verified review Amazon
Priya R Shastri Dec 24, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book gives you the steps to create production ready kubernetes infrastructure and details avoiding the pitfalls in the process. Kubenetes is an open source platform for automating deployment, scaling and management of containerized applications.In chapter 1, A single cluster can handle up to 5000 nodes. Control plane components are the core software pieces that construct the kubernetes master node. kube-apiserver, etcd, kube-controller-manager and kube-scheduler are described in this book. In chapter 2 Architecting kubernetes concepts are described. An Amazon EKS infrastructure is described. Kubernetes configuration management tools like Ansible, helm and terraform are introduced. In chapter 3 Provisioning Kubernetes with Terraform and AWS is described in detail. In chapter 4 Managing Kubernetes with Ansible are described. In chapter 5 Managing Kubernetes with Networking is detailked. In chapter 6 Securing kuberntes is described in detail. In chapter 7 Managing storage is described. In chapter 8 Deploying reliable and seamless applications are described. In chapter 9 Monitoring, logging is described. In chapter 10 Operating and maintaining kubernetes clusters are described.Overall,great reference for details on how to deploy kubernetes.great content.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela