Summary
Learning to navigate telemetry data produced by systems comfortably takes time. Even with years of experience, the most knowledgeable engineers can still be puzzled by unexpected changes in observability data. The more time spent getting comfortable with the tools, the quicker it will be to get to the bottom of just what caused changes in behavior.
The tools and techniques described in this chapter can be used repeatedly to better understand exactly what a system is doing. With chaos engineering practices, we can improve the resilience of our systems by identifying areas that can be improved upon under controlled circumstances. By methodically experimenting and observing the results from our hypotheses, we can measure the improvements as we're making them.
Many tools are available for experimenting and simulating failures; learning how to use these tools can be a powerful addition to any engineer's toolset. As we worked our way through the vast amount of data...