Alerting
When problems are detected through the metrics, an automatic alert should be triggered. Prometheus has an alert system that will raise when a defined metric fulfills the defined alert.
Check out the Prometheus documentation on alerting for more information: https://prometheus.io/docs/alerting/latest/overview/.
Normally, alerts will be configured when the value of metrics is crossing some threshold. For example, the number of errors is higher than X, or the time to return a request is too high.
An alert could also be that some element is too low; for example, if the number of requests in a system falls to zero, that could be an indication that the system is down.
The built-in Alertmanager can alert in some ways, like sending an email, but it can also be connected to other tools to perform more complex actions. For example, connecting to an integrated incident solution like Opsgenie (https://www.atlassian.com/software/opsgenie) allows...