In order to see where our PostgreSQL cluster fits company expectations, we need to know whether the company and each individual part of the existing application stack has an overall target RTO. If it doesn't, it's our job to approximate one. This means contacting any decision-makers, product owners, architects, and so on, to know what RTO target we're trying to attain and how other resources may contribute. These will act as a type of maximum value we can't exceed.
Keep in mind that RTO values tend to be amplified between layers. If our RTO is higher than some portion of the application stack, that will necessarily raise the RTO of that layer as well, which may increase the RTO of each subsequent layer. This is the exact scenario we're trying to avoid.
Once we have an RTO expectation, we need to examine how possible it is to fall under that target. The easiest way to accomplish this is to build a spreadsheet that essentially consists of a list of dependencies, maintenance tasks, or other occurrences related to PostgreSQL.
The rows we used for Activity are mainly suggestions, and producing an exhaustive list is generally dependent on the architecture to a certain extent. However, all software requires upgrades, machines need to be rebooted, switchover tests to prove high availability functionality may be required, past experience with the full application stack and hardware may imply two unexpected outages per year, and so on. Each of these will contribute to the cumulative RTO for PostgreSQL which we can use as a reference value.
The number we use for the Count column should be the number of times the Activity happens on a yearly basis. As an example, PostgreSQL has a quarterly release schedule for non-critical bug and security enhancements. If you want to follow along with these, it could make sense to set the Count column of Minor Upgrade to 4.
A number of architectural examples that we'll discuss later in this chapter will make it possible to set the Time column to 0 for some actions, or at least to a much lower value. We'll discuss these where relevant. This is also one of the reasons we'll need to execute this recipe multiple times when deciding on an appropriate architecture.
Once we have accounted for as many foreseeable Action components that may be necessary over the course of a year, we'll have a cumulative total that may represent the RTO that PostgreSQL can achieve for a given architecture. As a sanity check, we should compare that value to the lowest RTO for any parts of the application stack that depend on PostgreSQL. It's important we don't exceed this target.
Then, as with RPO, we need to present the possible RTO to decision-makers so that it can be integrated into the overall company RTO. To do that, we must continue with the rest of the chapter to find one or two architectures with either higher or lower expected RTO, estimate the cost of each, and work on a suitable compromise.
Deriving an appropriate RTO may require multiple iterations of this recipe, from estimation, architecture selection, presenting it to appropriate parties, and so on. This isn't a fast or simple process, and it pays to get it right early. We need to know how many PostgreSQL nodes to purchase, where each will reside, how we switch to alternatives, how much time each step may take, and so on.