Observability, Auditing, and Continuous Improvement
In the last chapter, we learned about different architectural examples for reliability. Now we need to make sure that you have a process to ensure that the environment is working reliably. This can be achieved by verifying you have configured steps to monitor your resources and application. In this chapter, we will learn about designing for observability and the steps to audit it to ensure that the configuration is working as expected. We will also go over the steps to continuously improve your observability process. This chapter is crucial for enhancing reliability, as a comprehensive observability and auditing process is essential to ensure your system operates reliably. Without it, you cannot proactively address reliability issues before they lead to downtime and impact your applications.
In this chapter, we’re going to cover the following main topics:
- Observability is key to resilience
- Designing observability...