Saturation measures fullness of our services and the system. We should be aware if replicas of our services are processing too many requests and being forced to queue some of them. We should also monitor whether usage of our CPUs, memory, disks, and other resources reaches critical limits.
For now, we'll focus on CPU usage. We'll start by opening the Prometheus' graph screen.
1 open "http://$PROM_ADDR/graph"
Let's see if we can get the rate of used CPU by node (instance). We can use node_cpu_seconds_total metric for that. However, it is split into different modes, and we'll have to exclude a few of them to get the "real" CPU usage. Those will be idle, iowait, and any type of guest cycles.
Please type the expression that follows, and press the Execute button.
1 sum(rate( 2 node_cpu_seconds_total...