What is observability?
If you are reading this book, the odds are you have already read about or heard the term observability elsewhere, and have decided to apply it to your AWS workloads. You are in the right place. But even being a book for the practitioner, we can’t start this book without defining some terms. They will become our guide for the rest of this book, helping us drive our discussions. Let’s start with the main one: observability.
The engineer Rudolf E. Kálmán coined the term observability (abbreviated as o11y) in 1960.
In his 1960 paper, Kálmán describes what he calls observability in the field of control theory: the measure of how well someone can infer a system’s internal states from knowledge of its external signals/outputs.
Observability is another borrowed term, in the same way as software architecture, software engineering, and design patterns. We borrow a complex, mathematical term from an older, more mature field and make it ours in our younger computing field. And to do that, we need to make it softer to make it usable.
So, in this book, we will say an application has observability if the following is true:
- You can read any variable that affects the application state
- You can understand how the application reached that state
- You can execute both the aforementioned points without deploying any new code
So, your application is observable if you can answer questions that you knew you should ask, but you can also answer questions that you didn’t know you needed to ask.
So far, we have defined what observability is. But if you are like me, the first time I saw a description of observability like the one provided here, it didn’t help me understand it or even what made it different from our old friend: monitoring. But I like examples, so let me try to do a better job to help you. In the next section, we will see a small application example, we will apply monitoring practices to keep our application up and running, and we will fail. Let’s discuss why we failed and how observability principles can improve the situation in our sample scenario.