Issues with traditional monitoring techniques
Traditional monitoring techniques focused on collecting and analyzing a few predefined metrics and leveraging them to analyze the system’s health and use them for alerting. IT systems were managed and operated in isolation and all the IT management and engineering processes in an organization were framed around this construct and followed this isolated approach. Many IT system providers created monitoring tools to primarily monitor the application’s health in isolation.
Let’s discuss the issues with traditional monitoring techniques and why they no longer fit the bill for observability implementations. You may already be past these challenges, but we still recommend reading through each of the challenges as we talk about them from an observability perspective.
Modern infrastructure
Let’s consider a service that depends on three applications. The traditional approach Would have identified key parameters that define the health of each of these three individual applications. Each of these services will be monitored separately, assuming that if the applications are healthy individually then the business service that depends on these applications (fully or partially) would also be healthy and will serve the customers efficiently. There was no concept of service in this approach.
This method would have worked well for a traditional infrastructure, where the application was monolithic and hosted on physical hardware in data centers. This guaranteed a certain amount of resources for the application to run. Then came virtualization, which added another layer on top of the physical hardware, and the guarantee of dedicated resources was gone. The adoption of cloud infrastructure services such as AWS and GCP and cloud-native technologies such as serverless architecture, microservices, and containers have completely de-coupled infrastructure and applications, making the IT system more complex and interdependent. These technologies have introduced a level of unpredictability in IT systems’ operations. Hence the concepts, practices, and tools used for managing and maintaining the health of applications also have to change accordingly.
Pre-empting issues
One of the key issues with the traditional monitoring approach is that you pre-empt the metrics that need to be collected and monitored. Many of these key indicators or metrics are decided based on the past experiences of vendors, administrators, and system engineers. With more experience, engineers can come up with multiple and better key indicators. While this was effective to a certain extent in traditional infrastructure environments, modern distributed architecture has introduced a lot of interdependencies and complexity in IT environments, where the source of the problems or issues can drastically vary. Hence, pre-empting potential health indicators or metrics can be quite inaccurate and challenging.
Identifying why and where the problem exists
The main purpose of conventional monitoring is to detect when there is a problem. This provides a simple green, amber, or red health status indication but doesn’t answer why and where the issue originates. Once the issues have been flagged, it’s up to the administrators and engineers to figure out where and why the problem exists. Since modern infrastructure services are very transient, identifying the source of the problem is quite difficult or time-consuming. Hence, answering why and where as quickly as possible is critical in reducing MTTR and maintaining a stable service.