To create a reliable infrastructure, adding your virtual machines to an Availability Set is key. There are several scenarios that can have an impact on the availability of your Azure Virtual Machines. These are as follows:
- Unplanned hardware maintenance event: When hardware is about to fail, Azure fires an unplanned hardware maintenance event. Live migration technology is used, which predicts the failure and then moves the VM, the network connections, memory, and storage to different physical machines without disconnecting the client. When your VM is moved, the performance is reduced for a short time because the VM is paused for 30 seconds. Network connections, memory, and open files are still preserved.
- Unexpected downtime: The virtual machine is down when this event occurs because Azure needs to heal your VM inside the same data center. A hardware or physical infrastructure failure often causes this event to happen.
- Planned hardware maintenance event: This type of event is a periodic update from Microsoft in Azure to improve the platform. Most of these updates don't have a significant impact on the uptime of VMs, but some of them may require a reboot or restart.
To provide redundancy during these types of events, you can group two or more VMs in an Availability Set. By leveraging Availability Sets, VMs are distributed across multiple isolated hardware nodes in a cluster. This way, Azure can ensure that during an event or failure, only a subset of your VMs is impacted and your overall solution will remain operational and available. This way, the 99.95% Azure SLA can be met.
For a detailed overview of when and how the SLA applies, you can refer to the following overview: https://azure.microsoft.com/en-us/support/legal/sla/virtual-machines/v1_6/.