The Service Level Agreement (SLA) is between a service provider and client and it basically defines the level of service that is expected from the service provider. SLA is also different for different services such as VM and storage. SLA document size really varies depending upon the criticality and the complexity of the service.
Let's look at a use case. API Corp. is an organization that hosts various API services related to customer's behavior on the client's website. Whenever an application makes requests, the response time is generally less than 5 minutes. They have an SLA of a response time of 10 minutes. Whenever a customer registers and pays for the services of API Corp., the API Corp. is responsible for maintaining the response time within a given SLA document. If it fails to do so, it is the responsibility of the organization to compensate and take ownership of the failure.
Sometimes, service providers have clauses such as beyond our control to compensate for disasters or events beyond their control, so customers have to be very careful while reading the SLA and if they find it acceptable, then they can sign up for the service.
In the SLA, there is also a term called as indemnification. In order to understand this, let's take an example. ISP has an SLA of 99.9999% uptime to the customers. A customer was going to make a bid of 10,000$ on a very crucial online platform, and on that day, the ISP was down the entire day and he was not able to make the bid and hence incurred heavy losses. Now, the question is, who is responsible to give a payback? This is why the term indemnification is used, which states, if the customer has faced any loss because of the service provider, then how much % of that indemnification a customer can put on the service provider.
Normally, in the SLA, there is a line that states that indemnification cannot exceed more than 90% of the annual charges of the services.
The SLA is generally specific to four major aspects:
- Availability
- Performance/Maximum Response Time (MRT)
- Mean time between failures (MTBF)
- Mean time to repair (MTTR)
Here are some of the SLAs for various cloud providers for the compute services:
Cloud providers |
Service Level Agreement |
Amazon EC2 |
99.95% |
Rackspace |
100% |
Microsoft Azure |
99.95% |
DigialOcean |
99.99% |
Linode |
99.99% |
It's always recommended to get the technical staff and internal auditor to go through the SLA. There can be some kind of caveat that you must be aware of. Along with this, always have a contingency plan to prepare for the worst-case scenario.