Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The Kubernetes Bible

You're reading from   The Kubernetes Bible The definitive guide to deploying and managing Kubernetes across cloud and on-prem environments

Arrow left icon
Product type Paperback
Published in Nov 2024
Publisher Packt
ISBN-13 9781835464717
Length 720 pages
Edition 2nd Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Gineesh Madapparambath Gineesh Madapparambath
Author Profile Icon Gineesh Madapparambath
Gineesh Madapparambath
Russ McKendrick Russ McKendrick
Author Profile Icon Russ McKendrick
Russ McKendrick
Arrow right icon
View More author details
Toc

Table of Contents (24) Chapters Close

Preface 1. Kubernetes Fundamentals FREE CHAPTER 2. Kubernetes Architecture – from Container Images to Running Pods 3. Installing Your First Kubernetes Cluster 4. Running Your Containers in Kubernetes 5. Using Multi-Container Pods and Design Patterns 6. Namespaces, Quotas, and Limits for Multi-Tenancy in Kubernetes 7. Configuring Your Pods Using ConfigMaps and Secrets 8. Exposing Your Pods with Services 9. Persistent Storage in Kubernetes 10. Running Production-Grade Kubernetes Workloads 11. Using Kubernetes Deployments for Stateless Workloads 12. StatefulSet – Deploying Stateful Applications 13. DaemonSet – Maintaining Pod Singletons on Nodes 14. Working with Helm Charts and Operators 15. Kubernetes Clusters on Google Kubernetes Engine 16. Launching a Kubernetes Cluster on Amazon Web Services with Amazon Elastic Kubernetes Service 17. Kubernetes Clusters on Microsoft Azure with Azure Kubernetes Service 18. Security in Kubernetes 19. Advanced Techniques for Scheduling Pods 20. Autoscaling Kubernetes Pods and Nodes 21. Advanced Kubernetes: Traffic Management, Multi-Cluster Strategies, and More 22. Other Books You May Enjoy 23. Index

Troubleshooting Kubernetes

Troubleshooting Kubernetes involves diagnosing and resolving issues that affect the functionality and stability of your cluster and applications. Common errors may include problems with Pod scheduling, container crashes, image pull issues, networking issues, or resource constraints. Identifying and addressing these errors efficiently is crucial for maintaining a healthy Kubernetes environment.

In the upcoming sections, we’ll cover the essential skills you need to get started with Kubernetes troubleshooting.

Getting details about resources

When troubleshooting issues in Kubernetes, the kubectl get and kubectl describe commands are indispensable tools for diagnosing and understanding the state of resources within your cluster. You have already used these commands multiple times in the previous chapters; let us revisit the commands here again.

The kubectl get command provides a high-level overview of various resources in your cluster, such as pods, services, deployments, and nodes. For instance, if you suspect that a pod is not running as expected, you can use kubectl get pods to list all pods and their current statuses. This command will show you whether pods are running, pending, or encountering errors, helping you quickly identify potential issues.

On the other hand, kubectl describe dives deeper into the details of a specific resource. This command provides a comprehensive description of a resource, including its configuration, events, and recent changes. For example, if a Pod from the previous command is failing, you can use kubectl describe pod todo-app to get detailed information about why it might be failing.

This output includes the Pod’s events, such as failed container startup attempts or issues with pulling images. It also displays detailed configuration data, such as resource limits and environment variables, which can help pinpoint misconfigurations or other issues.

To illustrate, suppose you’re troubleshooting a deployment issue. Using kubectl get deployments can show you the deployment’s status and number of replicas. If a deployment is stuck or not updating correctly, kubectl describe deployment webapp will provide detailed information about the deployment’s rollout history, conditions, and errors encountered during updates.

In the next section, we will learn the important methods to find logs and events in Kubernetes to make our troubleshooting easy.

Kubernetes Logs and Events for troubleshooting

Kubernetes offers powerful tools like Events and Audit Logs to monitor and secure your cluster effectively. Events, which are cluster-wide resources of the Event kind, provide a real-time overview of key actions, such as pod scheduling, container restarts, and errors. These events help in diagnosing issues quickly and understanding the state of your cluster. You can view events using the kubectl get events command:

$ kubectl get events

This command outputs a timeline of events, helping you identify and troubleshoot problems. To focus on specific events, you can filter them by resource type, namespace, or time period. For example, to view events related to a specific pod, you can use the following:

$ kubectl get events --field-selector involvedObject.name=todo-pod

Audit Logs, represented by the Policy kind, are vital for ensuring compliance and security within your Kubernetes environment. These logs capture detailed records of API requests made to the Kubernetes API server, including the user, action performed, and outcome. This information is crucial for auditing activities like login attempts or privilege escalations. To enable audit logging, you need to configure the API server with an audit policy. Refer to the Auditing documentation (https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/) to learn more.

When debugging Kubernetes applications, the kubectl logs command is an essential tool for retrieving and analyzing logs from specific containers within a pod. This helps in diagnosing and troubleshooting issues effectively.

To fetch logs from a pod, the basic command is as follows:

$ kubectl logs todo-app

This retrieves logs from the first container in the pod. If the pod contains multiple containers, specify the container name:

$ kubectl logs todo-app -c app-container

For real-time log streaming, akin to tail -f in Linux, use the -f flag:

$ kubectl logs -f todo-app

This is useful for monitoring live processes. If a pod has restarted, you can access logs from its previous instance using the following:

$ kubectl logs todo-app --previous

To filter logs based on labels, combine kubectl with tools like jq:

$ kubectl get pods -l todo -o json | jq -r '.items[] | .metadata.name' | xargs -I {} kubectl logs {}

To effectively manage logs in Kubernetes, it’s crucial to implement log rotation to prevent excessive disk usage, ensuring that old logs are archived or deleted as new ones are generated. Utilizing structured logging, such as JSON format, makes it easier to parse and analyze logs using tools like jq. Additionally, setting up a centralized logging system, like the Elasticsearch, Fluentd, Kibana (EFK) stack, allows you to aggregate and efficiently search logs across your entire Kubernetes cluster, providing a comprehensive view of your application’s behavior.

Together, Kubernetes Events and Audit Logs provide comprehensive monitoring and security capabilities. Events offer insights into the state and behavior of your applications, while Audit Logs ensure that all actions within the cluster are tracked, helping you maintain a secure and compliant environment.

kubectl explain – the inline helper

The kubectl explain command is a powerful tool in Kubernetes that helps you understand the structure and fields of Kubernetes resources. Providing detailed information about a specific resource type allows you to explore the API schema directly from the command line. This is especially useful when writing or debugging YAML manifests, as it ensures that you’re using the correct fields and structure.

For example, to learn about the Pod resource, you can use the following command:

$ kubectl explain pod

This command will display a high-level overview of the Pod resource, including a brief description. To dive deeper into specific fields, such as the spec field, you can extend the command like this:

$ kubectl explain pod.spec

This will provide a detailed explanation of the spec field, including its nested fields and the expected data types, helping you better understand how to configure your Kubernetes resources properly.

Interactive troubleshooting using kubectl exec

Using kubectl exec is a powerful way to troubleshoot and interact with your running containers in Kubernetes. This command allows you to execute commands directly inside a container, making it invaluable for debugging, inspecting the container’s environment, and performing quick fixes. Whether you need to check logs, inspect configuration files, or even diagnose network issues, kubectl exec provides a direct way to interact with your applications in real time.

To use kubectl exec, you can start with a simple command execution inside the container (you may use kubectl apply –f trouble/blog-portal.yaml for testing):

$ kubectl get po -n trouble-ns
NAME                   READY   STATUS    RESTARTS   AGE
blog-675df44d5-gkrt2   1/1     Running   0          29m

For example, to list the environment variables of a container, you can use the following:

$ kubectl exec blog-675df44d5-gkrt2 -- env

If the pod has multiple containers, you can specify which one to interact with using the -c flag:

$ kubectl exec blog-675df44d5-gkrt2 -c blog -- env

One of the most common uses of kubectl exec is to open an interactive shell session within a container. This allows you to run diagnostic commands on the fly, such as inspecting log files or modifying configuration files. You can start an interactive shell (/bin/sh, /bin/bash, etc.), as demonstrated here:

$ kubectl exec -it blog-675df44d5-gkrt2 -n trouble-ns -- /bin/bash
root@blog-675df44d5-gkrt2:/app# whoami;hostname;uptime
root
blog-675df44d5-gkrt2
14:36:03 up 10:19,  0 user,  load average: 0.17, 0.07, 0.69
root@blog-675df44d5-gkrt2:/app#

Here, the following applies:

  • -i: This is an interactive session.
  • -t: This allocates pseudo-TTY.

This interactive session is particularly useful when you need to explore the container’s environment or troubleshoot issues that require running multiple commands in sequence.

In addition to command execution, kubectl exec supports copying files to and from containers using kubectl cp. This can be particularly handy when you need to bring in a script or retrieve a log file for further analysis. For instance, here’s how to copy a file from your local machine into a container:

$ kubectl cp troubles/test.txt blog-675df44d5-gkrt2:/app/test.txt -n trouble-ns
$ kubectl exec -it blog-675df44d5-gkrt2 -n trouble-ns -- ls -l /app
total 8
-rw-r--r-- 1 root root 902 Aug 20 16:52 app.py
-rw-r--r-- 1 1000 1000  20 Aug 31 14:42 test.txt

And to copy a file from a container to your local machine, you’d need the following:

$ kubectl cp blog-675df44d5-gkrt2:/app/app.py /tmp/app.py  -n trouble-ns

This capability simplifies the process of transferring files between your local environment and the containers running in your Kubernetes cluster, making troubleshooting and debugging more efficient.

In the next section, we will learn about ephemeral containers, which are very useful in Kubernetes troubleshooting tasks.

Ephemeral Containers in Kubernetes

Ephemeral containers are a special type of container in Kubernetes designed for temporary, on-the-fly tasks like debugging. Unlike regular containers, which are intended for long-term use within Pods, ephemeral containers are used for inspection and troubleshooting and are not automatically restarted or guaranteed to have specific resources.

These containers can be added to an existing Pod to help diagnose issues, making them especially useful when traditional methods like kubectl exec fall short. For example, if a Pod is running a distroless image with no debugging tools, an ephemeral container can be introduced to provide a shell and other utilities (e.g., nslookup, curl, mysql client, etc.) for inspection. Ephemeral containers are managed via a specific API handler and can’t be added through kubectl edit or modified once set.

For example, in Chapter 8, Exposing Your Pods with Services, we used k8sutils (quay.io/iamgini/k8sutils:debian12) as a separate Pod to test the services and other tasks. With ephemeral containers, we can use the same container image but insert the container inside the application Pod to troubleshoot.

Assume we have the Pod and Service called video-service running in the ingress-demo namespace (Refer to the ingress/video-portal.yaml file for deployment details). It is possible to start debugging utilizing the k8sutils container image as follows:

$ kubectl debug -it pod/video-7d945d8c9f-wkxc5 --image=quay.io/iamgini/k8sutils:debian12 -c k8sutils -n ingress-demo
root@video-7d945d8c9f-wkxc5:/# nslookup video-service
Server:         10.96.0.10
Address:        10.96.0.10#53
Name:   video-service.ingress-demo.svc.cluster.local
Address: 10.109.3.177
root@video-7d945d8c9f-wkxc5:/# curl http://video-service:8080
    <!DOCTYPE html>
    <html>
    <head>
      <title>Welcome</title>
      <style>
        body {
          background-color: yellow;
          text-align: center;
...<removed for brevity>...

In summary, ephemeral containers offer a flexible way to investigate running Pods without altering the existing setup or relying on the base container’s limitations.

In the following section, we will demonstrate some of the common Kubernetes troubleshooting tasks and methods.

Common troubleshooting tasks in Kubernetes

Troubleshooting Kubernetes can be complex and highly specific to your cluster setup and operations, as the list of potential issues can be extensive. Instead, let’s focus on some of the most common Kubernetes problems and their troubleshooting methods to provide a practical starting point:

  • Pods are in Pending state: The error message Pending indicates that the pod is waiting to be scheduled onto a node. This can be caused by insufficient resources or misconfigurations. To troubleshoot, use kubectl describe pod <pod_name> to check for events that describe why the pod is pending, such as resource constraints or node conditions. If the cluster doesn’t have enough resources, the pod will remain in the pending state. You can adjust resource requests or add more nodes. (Try using troubles/app-with-high-resource.yaml to test this.)
  • CrashLoopBackOff or container errors: The CrashLoopBackOff error occurs when a container repeatedly fails to start, possibly due to misconfigurations, missing files, or application errors. To troubleshoot, view the logs using kubectl logs <pod_name> or kubectl describe pod <pod_name> to identify the cause. Look for error messages or stack traces that can help diagnose the problem. If a container has an incorrect startup command, it will fail to start, leading to this error. Reviewing the container’s exit code and logs will help fix any issues. (Apply troubles/failing-pod.yaml and test this scenario.)
  • Networking issues: These types of errors suggest that network policies are blocking traffic to or from the pod. To troubleshoot, you can check the network policies affecting the pod using kubectl describe pod <pod_name>, and verify service endpoints with kubectl get svc. If network policies are too restrictive, necessary traffic might be blocked. For example, an empty ingress policy could prevent all traffic to a pod, and adjusting policies will allow the required services to communicate. (Use troubles/networkpolicy.yaml to test this scenario.)
  • Node not ready or unreachable: The NotReady error indicates that a node is not in a ready state due to conditions like network issues. To troubleshoot, check the node status with kubectl get nodes and kubectl describe node <node_name>. This error may also be caused by node taints that prevent scheduling. If a node has the taint NoSchedule, it won’t accept pods until the issue is resolved or the taint is removed.
  • Storage issues: The PersistentVolumeClaim Pending error occurs when a persistent volume claim (PVC) is waiting for a matching persistent volume (PV) to be bound. To troubleshoot, check the status of PVs and PVCs with kubectl get pv and kubectl get pvc. For CSI, ensure the storageClass is configured properly and requested in the PVC definition accordingly. (Check troubles/pvc.yaml to explore this scenario.)
  • Service unavailability: The Service Unavailable error means that a service is not accessible, potentially due to misconfigurations or networking issues. To troubleshoot, check the service details using kubectl describe svc <service_name>. Verify that the service is correctly configured and points to the appropriate pods by using appropriate labels. If the service is misconfigured, it may not route traffic to the intended endpoints, leading to unavailability. You can verify the Service endpoints (Pods) using the kubectl describe svc <service_name> command.
  • API server or control plane issues: These errors typically point to connectivity problems with the API server, often due to issues within the control plane or network. Since kubectl commands won’t work if the API server is down, you need to log in directly to the control plane server where the API server pods are running. Once logged in, you can check the status of the control plane components using commands like crictl ps (if you are using containerd) or docker ps (if you are using Docker) to ensure the API server Pod is up and running. Additionally, review logs and check the network connections to verify that all control plane components are functioning correctly.
  • Authentication and authorization problems: The Unauthorized error indicates issues with user permissions or authentication. To troubleshoot, verify user permissions with kubectl auth can-i <verb> <resource>. For example, if a user lacks the required role or role binding, they will encounter authorization errors. Adjust roles and role bindings as needed to grant the necessary permissions.
  • Resource exhaustion: The ResourceQuota Exceeded error occurs when a resource quota is exceeded, preventing the allocation of additional resources. To troubleshoot and monitor resource usage, use kubectl get quota, kubectl top nodes, and kubectl top pods. If a quota is too low, it may block new resource allocations. Adjusting resource quotas or reducing resource usage can alleviate this issue.
  • Ingress or load balancer issues: The IngressController Failed error suggests that the ingress controller is not functioning correctly, impacting traffic routing. To troubleshoot, check the Ingress details using kubectl describe ingress <ingress_name>. Ensure that the ingress controller is properly installed and configured and that ingress rules correctly map to services. Misconfigurations in ingress rules can prevent proper traffic routing. Also, ensure the hostname DNS resolution is in place if you are using the optional host field in the Ingress configuration.

This was the last practical demonstration in this book, so let’s now summarize what you have learned.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image