Logging and tracing provide a wealth of information about how components of your cloud systems are behaving, but they generally only provide a partial picture of system behavior as a whole. Many important aspects of system health exist outside the scope of logging and tracing. Very often these aspects are best measured in terms of change over time, allowing developers to identify trends and anomalies.
Building on our to-do example, a sudden spike in concurrent connections to our todos-db Cloud SQL instance may indicate that a recently pushed version of todos-backend is not correctly terminating stale connections. Likewise, identifying patterns in user traffic to our todos-frontend may allow us to identify optimal maintenance windows or eagerly scale ahead of demand.
Additionally, while collecting the right data is important to effectively monitor cloud...