Gathering metrics and sending alerts with Prometheus
Prometheus is the dominant Kubernetes-based system for collecting metrics on cluster operations. Prometheus sports a wide range of features related to handling time-series data, visualizing data, querying it, and sending alerts based on metrics data.
This metrics data might include a variety of time-series data for CPU usage, both for nodes and for pods; storage utilization; application health, as defined by readiness probes; and other application-specific metrics. Prometheus uses a pull model where it polls endpoints for numeric data. Pods, DaemonSets, and other Kubernetes resources supporting Prometheus use annotations to advertise that Kubernetes should scrape them for metrics data via HTTP, usually via a /metrics
endpoint. This can include data from Nodes, surfaced through a DaemonSet called node_exporter
that runs on each Node.
It stores the metrics data it receives by associating this data with a metric name and a set...