A focus on the mean time to recovery is critical to achieving the promise of cloud-native. The traditional focus on maximizing the mean time between failures is too conservative a posture for companies to thrive in this day and age. Companies must be lean, move fast, experiment, and pivot quickly. We must be willing to make mistakes and rapidly adapt. Fortunately, the cloud-native concepts that empower teams to rapidly deliver innovation also empower them to rapidly repair and roll-forward. Yet, in order to make a repair, we must first identify the need for a repair.
We are already confident in our ability to move fast. We have spent the vast majority of this book on the topic. However, to be truly confident in the pace of cloud-native, we must be confident that we will detect problems in our cloud-native systems before our users do. When our customers...