Understanding the IBM reference architecture for incident management
Incident management can be described as restoring the healthy state of a system after an incident affects that system. IBM Architecture Center describes the end-to-end workflow for managing any incident in the reference architecture, as shown in Figure 10.13.
Incident management happens in four stages: monitoring, analyzing, planning, and executing. The primary artifacts for incident management are observability dashboards and runbooks.
The following dashboard shows a typical cloud platform’s current IT and business aspects. In addition, it shows the healthy and unhealthy components of the platform and provides visual insights to determine the incident along with its root cause:
Figure 10.13 – An incident management reference architecture by IBM (credit: IBM Architecture Center)
Runbooks refer to scripts or written instructions that are used by first responders to resolve...