Azure infrastructure availability
Azure is designed, built, and operated to deliver highly available and reliable infrastructure. Improvements are constantly implemented to increase availability and reliability, along with efficiency and scalability. Delivery of a more secure and trusted cloud is always a priority.
Uninterruptible power supplies and vast banks of batteries ensure that the flow of electricity stays uninterrupted in case of short-term power disruptions. In the case of long-term power disruptions, emergency generators can provide backup power for days. Emergency power generators are used in cases of extended power outages or planned maintenance. In cases of natural disasters, when the external power supply is unavailable for long periods, each Azure data center has fuel reserves on-site.
Robust and high-speed, fiber optic networks connect data centers to major hubs. It's important that, along with connections through major hubs, data centers are connected directly. Everything is distributed into nodes, which host workloads closer to users to reduce latency, provide geo-redundancy, and increase resiliency.
Data in Azure can be placed in two separate regions: primary and secondary regions. A customer can choose where the primary and secondary regions will be. The secondary region is a backup site. In each region, primary and secondary, Azure keeps three healthy copies of your data at all times. This means that six copies of the data are available at any time. If any data copy becomes unavailable at any time, it's immediately declared invalid, a new copy is created, and the old one is destroyed.
Microsoft ensures high availability and reliability through constant monitoring, incident response, and service support. Each Azure data center operates 24/7/365 to ensure that everything is running, and all services are available at all times. Of course, available at all times is a goal that, ultimately, is impossible to reach. Many circumstances can impact uptime, and sometimes it's impossible to control all of them. Realistically, the aim is to achieve the best possible Service Level Agreement (SLA) so as to ensure that potential downtime is limited as far as possible. The SLA can vary based on a number of factors and is different per service and configuration. If we take into account all the factors we can control, the best SLA we can achieve would be 99.99%, also known as four nines.
Closely connected to infrastructure availability is infrastructure integrity. Integrity affects the availability terms of deployment, where all steps must be verified from different perspectives. New deployments must not cause any downtime or affect existing services in any way.