Defining alert rules
Let’s start off by talking about how we want to look at triggering alerts. To build an alert, you will need to answer a series of questions in this form:
What condition must exist as measured by what metrics, and for how long?
Let’s break this concept down into its constituent parts.
What condition…
An alert ultimately boils down to a switch: at any given moment in time, the evaluation interval, an alert may need to be triggered. How you determine whether the alert should be in a triggered (or firing) state is called the alert condition. Most of the work you will do in defining an alert condition consists of reducing metrics data to a simple Boolean yes-or-no assertion about whether an alert should be triggered.
Space prevents us from devoting an entire chapter to exploring the possible ways to define alert conditions, but I can offer some heuristics for identifying possible alert conditions:
- Is the condition based upon...