Alerting best practices
The knowledge you will gain by reading this section should be useful for establishing the new alerting process for your services. It will also help you improve existing alerts if you are working with some established alerting processes.
Among the most valuable best practices, I would highlight the following ones:
- Keep your alerts immediately actionable: Alerting is a powerful technique for ensuring any issues or incidents get acknowledged and addressed. However, you should not overuse it for the types of issues that do not require immediate attention. Some types of alerts, such as alerts indicating high saturation, are not necessarily actionable. For example, a sudden increase in CPU load may not indicate any immediately actionable issue, unless it remains high for some prolonged period (for example, the CPU load not going below 85% for more than 10 minutes), and may just be a transient symptom of high service usage. When creating alerts, think about...