Architectural considerations for observability in a platform
Self-service and cognitive load – we’ve already alluded to the importance of observability in these topics. Your platform cannot act in service to its users without observability prioritized. In other words, you should expect to watch and measure everything.
Observability comes in two flavors; the first is observability for the benefit of the platform team. This is the way the platform team will collect data and information to help build their reliability. Using the site reliability practices of SLOs, the platform team can measure customer satisfaction by setting SLOs for the platform and then creating observability in support of them.
The good news is that, in the Kubernetes world, this is easier because of objects and microservices. The bad news is that there are a lot more things to measure than ever before due to microservices, and it’s hard to know whether you’ve captured everything...