We need to define the scope of what we want to accomplish through instrumentation. We'll keep it small by limiting ourselves to a single goal. We'll scale services if their response times are over an upper limit and de-scale them if they're below a lower limit. Any other alert will lead to a notification to Slack. That does not mean that Slack notifications should exist forever. Instead, they should be treated as a temporary solution until we find a way to translate manual corrective actions into automated responses performed by the system.
A good example of alerts that are often treated manually are responses with errors (status codes 500 and above). We'll send alerts whenever they reach a threshold over a specified period. They will result in Slack notifications that will become pending tasks for humans. An internal rule should be...