Monitoring is the process of collecting performance-related metrics from the system. When paired with alerting, monitoring helps us understand when our system behaves as expected and when an incident happens.
The three types of metrics that would interest us the most are as follows:
- Availability, which lets us know which of our resources are up and running, and which of them have crashed or became unresponsive.
- Resource utilization gives us insight into how the workload fits into the system.
- Performance, which shows us where and how to improve service quality.
The two models of monitoring are push and pull. In the former, each monitored object (a machine, an application, and a network device) pushes data to the central point periodically. In the latter, the objects present the data at the configured endpoints and the monitoring agent scrapes the data regularly.
The pull model makes it easier to scale. This way, multiple objects won't be clogging the monitoring agent...