The One with the Systems Check-Ups
Logging, Monitoring, and Metrics
Now and then, software fails. Whether we like it or not, that is simply a fact of life. We make mistakes during development. Other people make mistakes. The environment changes. A network becomes unstable. These are all reasons the system might not behave as we intended.
Testing can help. A good and solid set of tests can show you the errors in your work and help make your system more robust. However, sometimes things still go wrong. Let’s face it: building software is a creative art form and thus subject to influences beyond our control. So, when things go wrong and our systems do not do what we thought they would be doing, we need a way to look into their workings. That can help us figure out what happened and what we can do to fix things.
This is where logging and monitoring comes into play. Logging helps us write important information and store it in a well-known place. Logging is part of our code...