Performing Health Checks on Your Services
Maintaining maximum uptime is an important aspect of any system. In the previous chapter, we saw where we can write code in a fault-tolerant manner that will reduce the prevalence of outages in our infrastructure and network. This, however, is not a long-term solution, and things fail regardless of these measures. It then leads to the notion that we need to know when there are failures.
This is where we start thinking about health checks. Health checks exist as a mechanism to inform us of outages in our services and supporting databases and connections in our application. Generally, this can be accomplished with a simple ping request to a resource. The resource is available and operating as expected if we get a response. In the absence of a response, we assume that the resource is down and trigger an alert.
There are statuses between the service’s up and down status, and we will discover those options in this chapter. We will also...