Performance SLA
We have covered the metrics; how do we put them to use?
As usual, let's begin with the customer. In this case, it is your CIO or head of Infrastructure, as the scope now is all VMs, and not just one VM.
For performance, the main requirement from your CIO or management is typically around your IaaS system's ability to deliver. They want your IaaS to perform, as their business runs on it. The question is this:
How do you prove that… not a single VM… in the past 1 month… suffers unacceptable performance hit because of non-performing IaaS?
That's an innocent, but loaded, question. You need to consider the impact carefully before answering, "That's easy!"
If you have 1000 VMs, you need to answer for 1000 VMs. For each VM, you need to answer for CPU, RAM, disk and network. That's 4000 metrics. If your management or customer agrees on a 5-minute sampling period, you have 12 samples in 1 hour. In 1 day, you have 288 samples. In...