Why monitoring?
Monitoring is a generic term in the context of operating a business. As part of effectively running a business, various elements of the business operations are measured for their health and effectiveness. The outputs of such measurements are compared against the business goals. When such efforts are done periodically and systematically, it could be called monitoring.
Monitoring could be done directly or indirectly, and voluntarily or dictated by some law. Successful businesses are good at tracking growth by using various metrics and taking corrective actions to fine-tune their operations and goals as needed.
The previous statements are common knowledge and applicable to any successful operation, even if it is a one-time event. However, there are a few important keywords we already mentioned – metrics, measurement, and goal. A monitoring activity is all about measuring something and comparing that result against a goal or a target.
While it is important to keep track of every aspect of running a business, this book focuses on monitoring software application systems running in production. Information Technology (IT) is a key element of running businesses and that involves operating software systems. With software available as a service through the internet (commonly referred to as Software as a Service or just SaaS), most software users don't need to run software systems in-house, and thus there's no need for them to monitor it.
However, SaaS providers need to monitor the software systems they run for their customers. Big corporations such as banks and retail chains may still have to run software systems in-house due to the non-availability of the required software service in the cloud or due to security or privacy reasons.
The primary focus of monitoring a software system is to check its health. By keeping track of the health of a software system, it is possible to determine whether the system is available or its health is deteriorating.
If it is possible to catch the deteriorating state of a software system ahead of time, it might be possible to fix the underlying issue before any outage happens, which would ensure business continuity. Such a method of proactive monitoring that provides warnings well in advance is the ideal method of monitoring. But that is not always possible. There must also be processes in place to deal with the outage of software systems.
A software system running in production has three major components:
- Application software: A software service provider builds applications that will be used by customers or the software is built in-house for internal use.
- Third-party software: Existing third-party software such as databases and messaging platforms are used to run application software. These are subscribed to as SaaS services or deployed internally.
- Infrastructure: This usually refers to the network, compute, and storage infrastructure used to run the application and third-party software. This could be bare-metal equipment in data centers or services provisioned on a public cloud platform such as AWS, Azure, or GCP.
Though we discussed monitoring software systems in broad terms earlier, upon a closer look, it is clear that monitoring of the three main components mentioned previously is involved in it. The information, measured using related metrics, is different for each category.
The monitoring of these three components constitutes core monitoring. There are many other aspects – both internal to the system, such as its health, and external, such as security – that would make the monitoring of a software system complete. We will look at all the major aspects of software system monitoring in general in this introductory chapter, and as it is supported by Datadog in later chapters.