Our alerting is already set up. Alertmanager is configured to send notifications to Slack. While that was a good step forward, it is still far from having alerting that serves as the base of a self-adapting and self-healing system. What we did by now can be considered a fall-back strategy. If the system cannot detect changed conditions and, when needed, adapt or heal itself, notifying humans through Slack is a good solution. In some cases, Slack notifications will be temporary and replaced with requests to the system that will auto-correct itself. In other situations, the system will not be able to fix itself, so notifications will have to be sent to doctors (us, humans, engineers).
We already built the initial solution for an alerting system. Alertmanager can fulfill some of our needs. It is not alone, and there is another one that we used throughout the...