Effective Alerting with Prometheus
Thus far, we’ve looked primarily at how to get data into Prometheus through scrape jobs, discovering scrape targets, and manually querying data. But no monitoring system is truly useful if you need to constantly check if everything is okay; we need some system running in the background evaluating the state of our systems and alerting us if they’re not working correctly. In this chapter, we’ll look at how Prometheus achieves that through a combination of its rule subsystem and the separate Alertmanager component.
We’ll cover the following main topics:
- Alertmanager configuration and routing
- Alertmanager templating
- Highly available (HA) alerting
- Making robust alerts
- Unit-testing alerting rules
Let’s get started!