Chapter 7: AIOps and Root Cause Analysis
Up until this point, we have extensively explained the value of detecting anomalies across metrics and logs separately. This is extremely valuable, of course. In some cases, however, the knowledge that a particular metric or log file has gone awry may not tell the whole story of what is going on. It may, for example, be pointing to a symptom and not the cause of the problem. To have a better understanding of the full scope of an emerging problem, it is often helpful to look holistically at many aspects of a system or situation. This involves smartly analyzing multiple kinds of related datasets together.
In this chapter, we will cover the following topics:
- Demystifying the term ''AIOps''
- Understanding the importance and limitations of KPIs
- Moving beyond KPIs
- Organizing data for better analysis
- Leveraging the contextual information
- Bringing it all together for RCA