Alerting is a double-edged sword that we have to learn to use productively. The teams have created synthetic transactions to exercise their components on a regular basis to help assert the success of their deployments and the health of their components. Each component has been instrumented to sufficiently increase the observability of its internal operation. As a result, the teams are now awash in a sea of metrics. Categorizing this data into work metrics, resource metrics, and events helps to make sense of the different signals emitted by the components. Some teams have honed in on their key performance indicators, while others are still waiting for the dust to settle. Regardless, there is too much information to consume manually. Monitors need to be defined to watch the data and alert the team accordingly.
The classic problem with monitoring is alert fatigue. Teams...