Introducing the relationship between Amazon CloudWatch and Well-Architected
The AWS Well-Architected framework is a set of principles that can be used to govern how an application is architected, developed, deployed, and scaled in the cloud. It is a compilation of the experience of hundreds of AWS solution architects with decades of experience across various industries who have designed, managed, and scaled various types of systems. All of this knowledge and experience has all been put together to form the summarized principles that go into the AWS Well-Architected framework. This Well-Architected framework is made up of five pillars:
- Security
- Cost Optimization
- Performance Efficiency
- Operational Excellence
- Reliability
Each of these pillars covers a wide range of tenets for different aspects of infrastructure setup scaling, security, and deployment. But we will be looking at the monitoring aspect of the Well-Architected framework.
The Reliability pillar focuses on building systems that are reliable, stand the test of time, and are always available to do what they have been designed to do. This will require close monitoring of the system from time to time. It also refers to managing service quotas for different AWS resources as you continue to use the different AWS services.
Important Note
AWS provides an OnDemand scale for server resources and application services. This is managed using service quotas, which is a regulatory technique used by AWS to manage the maximum value of resources, actions and items in your AWS account (https://docs.aws.amazon.com/servicequotas/latest/userguide/intro.html).
CloudWatch alarms can be configured for these quotas so that alerts can be received when you are almost hitting the limit of the service allocation. This can be used to protect against a possible workload failure in situations when a service is needed and the limit has been exceeded.
With Performance Efficiency, the focus is more on ensuring that the application is performing well at all times. Two of the ways to ensure the system performs well are always having insights and understanding the behavior of the workloads and application. Rigorous testing of the application using various methods such as load testing can be very helpful in seeing the behavior of the system under load. When the load test is carried out, metrics and logs are collected. These logs are further studied to understand and gain insights into the behavior of the system. This can be done for the staging or test setup of the application, and it can help SREs understand what to get when the application is eventually released to customers.
The fear of bills is what chases a lot of first-time cloud users. The Cost Optimization pillar in the Well-Architected framework is focused on optimizing your AWS bill by using cost-effective services and designs when deploying workloads in your AWS infrastructure. But the part of CloudWatch that is connected to your cost is the auto-scaling feature, which can be very helpful in reducing the cost of your overall workload. CloudWatch metrics can be used to trigger the scaling up or scaling down of your infrastructure based on thresholds that have been configured. This can go a long way to save costs so that when the server resources being consumed are low, CloudWatch reduces the number of servers being used, but when the number goes higher, CloudWatch can still identify that and trigger a scale-up to add more servers and balance the load hitting the application.