Identifying Operational Gaps
When preparing for operational excellence, the first thing to do is to understand your workloads and their expected behaviors. You will then be able to adapt their design to retrieve information on their state and create the necessary operational procedures to support them. In order to understand workloads, you need to be able to analyze data, and to be able to do so, you first need to collect it. Telemetry refers to the automated process of collecting data from remote or inaccessible systems and transmitting it to a receiving system for monitoring and analysis. This is covered in the next section.
Designing Telemetry
When operating any workload, it is paramount to be able to tell at any moment in time whether that workload is working as expected. Systems should also be in place to otherwise be notified, typically by triggering alarms, when any unexpected event or abnormal behavior occurs. This is what monitoring entails.
Monitoring can only be...