PostgreSQL with Kubernetes
In this recipe, we discuss Kubernetes (K8s for short), the industry’s most prominent solution for automated application deployment, scaling, and management. It is free software, vendor neutral, and maintained by the Cloud Native Computing Foundation (CNCF).
CloudNativePG (CNPG) is the newest and fastest-rising Kubernetes operator for PostgreSQL. In other words, it provides automation around the entire Postgres lifecycle, taking care of deployment, scaling, and the management of database clusters.
In this recipe, we’ll use Minikube, a lightweight and fuss-free Kubernetes distribution for testing software deployment. It’s not suitable for production usage, but whatever we do in Minikube also holds true for any Kubernetes cluster, so you can take what you learn here and apply it to production-ready clusters.
Getting ready
First off, we install Minikube to provide a minimal Kubernetes cluster. Install Docker (or Podman) from your OS’s default package manager, then visit https://minikube.sigs.k8s.io/docs/start/ to find download and installation instructions for your operating system and architecture. For example, if you use Debian, then the installation is as simple as:
curl -LO \https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
sudo dpkg -i minikube_latest_amd64.deb
Next, assuming that your user has permission to use Docker, you can start Minikube with:
minikube start
At this point, you can install the kubectl
utility, which lets you interact with the Kubernetes cluster:
minikube kubectl -- get pods -A
The above command is a bit verbose; you can wrap it in a shorter alias:
alias kubectl="minikube kubectl --"
Now everything should be ready; you can verify that by running:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane 12m v1.27.4
which means that you’re ready to start your CloudNativePG journey.
How to do it...
In order to install the latest version (at the time of writing, v1.21.0) of the CloudNativePG operator into your Kubernetes cluster, run:
kubectl apply -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.21/releases/cnpg-1.21.0.yaml
We verify the installation with:
kubectl get deployment -n cnpg-system cnpg-controller-manager
NAME READY UP-TO-DATE AVAILABLE AGE
cnpg-controller-manager 0/1 1 0 15s
Let’s deploy a sample PostgreSQL cluster.
Kubernetes works in a declarative way: you declare what the cluster should look like, and then CNPG (the operator) will perform all the necessary operations that will end up with the cluster in the exact state that you declared.
In practice, we create a YAML file called sample-cluster.yaml
with the following content:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: sample-cluster
spec:
instances: 3
storage:
size: 1Gi
And then we apply that file by running:
kubectl apply -f sample-cluster.yaml
We can check what is going on by seeing which Postgres pods are up and running:
kubectl get pods
NAME READY STATUS RESTARTS AGE
sample-cluster-1-initdb-74xf7 0/1 PodInitializing 0 30s
Looks like we’re not done yet. Give it a moment, and then you will see:
kubectl get pods
NAME READY STATUS RESTARTS AGE
sample-cluster-1 1/1 Running 0 2m19s
sample-cluster-2 1/1 Running 0 1m41s
sample-cluster-3 1/1 Running 0 1m12s
Our Postgres nodes are up! They are now ready to be accessed by applications running inside the Kubernetes cluster by connecting to the following Services created by CNPG:
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 77m
sample-cluster-r ClusterIP 10.101.133.29 <none> 5432/TCP 42m
sample-cluster-ro ClusterIP 10.100.24.250 <none> 5432/TCP 42m
sample-cluster-rw ClusterIP 10.99.79.108 <none> 5432/TCP 42m
The sample-cluster-rw
Service lets you connect to the primary node for read/write operations, sample-cluster-ro
to standbys only for read-only operations, and sample-cluster-r
to any node (including the primary) for read operations.
You can find more sample configurations with more features at https://cloudnative-pg.io/documentation/current/samples/.
How it works…
The operator defines a new Kubernetes resource called Cluster
, representing a PostgreSQL cluster made up of a single primary and an optional number of physical replicas that co-exist in the chosen Kubernetes namespace for high availability and offloading of read-only queries.
Applications in the Kubernetes cluster can now access the Postgres database through the Service that the operator manages, without worrying about which node is primary and whether the primary changes due to a failover or switchover. For applications from outside the Kubernetes cluster, you need to expose Postgres via TCP by configuring a Service or Ingress object.
In our cluster, 1 GB of disk space was allocated for Postgres in the default Kubernetes storage. Be aware that we deployed Postgres with the default configuration, which is conservative and safe for testing on a laptop, but definitely not suitable for production usage.
You can find CNPG’s extensive documentation, which describes all you can do with the operator, including detailed Prometheus monitoring, backup and recovery, upgrades, migration, scaling, etc., and how to configure it for production use, at https://cloudnative-pg.io/documentation/current/.
There’s more...
CloudNativePG is able to react to the failure of a PostgreSQL instance by performing failover and/or creating new replicas, depending on what is needed to restore the desired state, which in our example is one primary node and two physical replicas.
We recommend this method for Kubernetes PostgreSQL deployments because it is not an attempt to shoehorn Postgres into Kubernetes with additional sidecar software to take care of the high availability aspect. It is built from the ground up with Postgres-specific resources, while respecting the cloud-native declarative conventions and using Kubernetes’s built-in facilities and features.
High availability has historically been a complex subject for PostgreSQL, as for other database systems, because the most difficult part is to diagnose failures correctly. The various middleware tools – for which we refer you to Chapter 12, Replication and Upgrades – employ a number of techniques to reduce the risk of doing the wrong thing due to a mistaken diagnosis.
Kubernetes changes the way high availability is achieved because it provides a very reliable interface for detecting node failures. CNPG is called “native” because it follows this approach strictly, and as a result it is becoming very popular in the Kubernetes world, probably also because people who are experienced with Kubernetes will recognize this approach as familiar and reliable.
CloudNativePG is the first PostgreSQL-related project to aim for CNCF certification through the Sandbox/Incubation/Graduation process. You can find the CNPG repository at https://github.com/cloudnative-pg/cloudnative-pg.