(For more resources related to this topic, see here.)
Whether we are talking about a highly available and critically productive environment or not, any planned or unexpected downtime means financial losses. Historically, solutions that could provide high availability and redundancy were costly and complex.
With the virtualization technologies available today, it becomes easier to provide higher levels of availability for environments where they are needed. With VMware products, and vSphere in particular, it's possible to do the following things:
Planned downtime usually happens during hardware maintenance, firmware or operating system updates, and server migrations. To reduce the impact of planned downtime, IT administrators are forced to schedule small maintenance windows outside the working hours.
vSphere makes it possible to dramatically reduce the planned downtime. With vSphere, IT administrators can perform many maintenance tasks at any point in time as it allows downtime elimination for many common maintenance operations.
This is possible mainly because workloads in a vSphere can be dynamically moved between different physical servers and storage resources without any service interruption.
The main availability capabilities that are built into vSphere allow the use of HA and redundancy features, and are as follows:
vSphere vMotion® and Storage vMotion functionalities allow the migration of VMs between ESXi hosts and their underlying storage without service interruption, as shown in the following figure:
In other words, vMotion is the live migration of VMs between ESXi hosts, and Storage vMotion is the live migration of VMs between storage LUNs. In both cases, VM retains its network and disk connection. With vSphere 5.1 and the later versions, it's possible to combine vMotion with Storage vMotion into a single migration that simplifies administration. The entire process takes less than two seconds on a GB network.
vMotion keeps track of the ongoing memory transaction while memory and system states get copied to the target host. Once copying is done, vMotion suspends the source VM, copies the transactions that happened during the process to the target host, and resumes the VM on the target host. This way, vMotion ensures transaction integrity.
vSphere requirements for vMotion are as follows:
Migration with vMotion happens in three stages:
VMs with snapshots can be vMotioned regardless of their power state as long as their files stay on the same storage. Obviously, this storage has to be accessible for both the source and destination hosts.
If migration involves moving configuration files or virtual disks, the following additional requirements apply:
To vMotion a VM in vCenter, right-click on a VM and choose Migrate… as shown in the following screenshot:
This opens a migration wizard where you can select whether it's going to migrate between hosts or storage or both. The Change hostoption is the standard vMotion, and Change datastore is the Storage vMotion. As you can see, the Change both host and datastore option is not available because this VM is currently running. As mentioned earlier, vSphere 5.1 and later support vMotion and Storage vMotion in one transaction.
In the next steps, you are able to choose the destination as well as the priority for this migration. Multiple VMs can be migrated at the same time if you make multiple selections in the Virtual Machines tab for the host or the cluster.
VM vMotion is widely used to perform host maintenance such as upgrading the ESX operating system, memory, or any other configuration changes. When maintenance is needed on a host, all the VMs can be migrated to other hosts and this host can be switched into the maintenance mode. This can be accomplished by right-clicking on the host and selecting Enter Maintenance Mode.
Environments, especially critical ones, need to be protected from any unplanned downtime caused by possible hardware or application failures. vSphere has important capabilities that can address this challenge and help to eliminate unplanned downtime.
These vSphere capabilities are transparent to the guest operating system and any applications running inside the VMs; they are also a part of the virtual infrastructure. The following features can be configured for VMs in order to reduce the cost and complexity of HA. More detail on these features will be given in the following sections of this article.
vSphere's HA is a feature that allows a group of hosts connected together to provide high levels of availability for VMs running on these hosts. It protects VMs and their applications in the following ways:
With vSphere HA, there is no need to install any additional software in a VM. After vSphere HA is configured, all the new VMs will be protected automatically.
The HA option can be combined with vSphere DRS to protect against failures and to provide load balancing across the hosts within a cluster
The advantages of HA over traditional failover solutions are listed in the VMware article at http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.avail.doc%2FGUID-CB46CEC4-87CD-4704-A9AC-058281CFD8F8.html.
Before HA can be enabled, a cluster itself needs to be created. To create a new cluster, right-click on the datacenter object in the Hosts and Clusters view and select New Cluster... as shown in the following screenshot:
The following prerequisites have to be considered before setting up a HA cluster:
ESX/ESXi 3.5 hosts are supported for vSphere HA with the following patches installed; these fix an issue involving file locks:
ESX 3.5: patch ESX350-201012401-SG and prerequisites
ESXi 3.5: patch ESXe350-201012401-I-BG and prerequisites
Once all of the requirements have been met, vSphere HA can be enabled in vCenter under the cluster settings dialog. In the following screenshot, it appears as PRD-CLUSTER Settings:
Once HA is enabled, all the cluster hosts that are running and are not in maintenance mode become a part of HA.
The following HA settings can also be changed at the same time:
More details on each of these settings can be found in the following sections of this article.
When a HA cluster is created, an agent is uploaded to all the hosts and configured to communicate with other agents within the cluster. One of the hosts becomes the master host, and the rest become slave hosts. There is an election process to choose the master host, and the host that mounts more datastores has an advantage in this election. In cases where we have a tie, the host with the lexically-highest Managed Object ID(MOID) is chosen.
MOID, also called MoRef ID, is a value generated by vCenter for each object: host, datastore, VM, and so on. It is guaranteed to be unique across the infrastructure managed by this particular vCenter server.
When it comes to the election process for choosing the master host, a host with ID 99 will have higher priority than a host with ID 100.
If a master host fails or becomes unavailable, a new election process is initiated.
Slave hosts monitor whether the VMs are running locally and report to the master host.
In its turn, the master host communicates with vCenter and monitors other hosts for failures. Its main responsibilities are listed as follows:
Host availability monitoring is done through a network heartbeat exchange, which happens every second by default. In cases where we lose network heartbeats with a host, before declaring it as failed, the master host checks whether this host communicates with any of the existing datastores using datastore heartbeats and responds to pings sent to its management IP address or not.
The master host detects the following types of host failure:
Type of failure |
Network heartbeats |
ICMP ping |
Datastore heartbeats |
Lost connectivity to the master host |
- |
+ |
+ |
Network isolation |
- |
- |
+ |
Failure |
- |
- |
- |
If host failure is detected, the host's VMs will be restarted on other hosts.
Host network isolation happens when a host is running but doesn't see any traffic from vSphere HA agents, which means that it's disconnected from the management network. Isolation is handled as a special case of failure in VMware HA. If a host becomes network isolated, the master host continues to monitor this host and the VMs running on it. Depending on the isolation settings chosen for individual VMs, some of them may be restarted on another host.
The master host has to communicate with vCenter, therefore, it can't be in the isolation mode. Once that happens, a new master host will be elected.
When network isolation happens, certain hosts are not able to communicate with vCenter, which may result in configuration changes not having effect on certain parts of the infrastructure. If a network infrastructure is configured correctly and has redundant network paths, isolation should happen rarely.
Datastore heartbeating was introduced in vSphere 5. In the previous versions of vSphere, once a host became unreachable through the management network, HA always initiated VM restart, even if the VMs were still running. This, of course, created unnecessary downtime and additional stress to the host. Datastore heartbeating allows HA to make a distinction between hosts that are isolated or partitioned and hosts that have failed, which adds more stability to the way HA works.
vCenter server selects a list of datastores for heartbeat verification to maximize the number of hosts that can be verified. It uses a selection algorithm designed to select datastores that are connected to the highest number of hosts. This algorithm attempts to choose datastores that are hosted on different storage arrays or NFS servers. It also prefers VMFS-formatted LUNs over NFS-hosted datastores.
vCenter selects datastores for heartbeating in the following scenarios:
By default, two datastores are selected. This is the minimum amount of datastores needed. It can be changed using the das.heartbeatDsPerHost parameter under Advanced Settings for up to five datastores. The PRD-CLUSTER Settings dialog box can be used to verify or change the datastores selected for heartbeating, as shown in the following screenshot:
It is recommended, however, to let vCenter choose the datastores. Only the datastores that are mounted to more than one host are available in the list.
Datastore heartbeating leverages the existing VMFS filesystem locking mechanism. There is a so-called heartbeat region that exists on each datastore and is updated as long as the lock on a file exists. A host updates the datastore's heartbeat region if it has at least one file opened on this volume. HA creates a file for datastore heartbeating purposes only to make sure there is at least one file opened on a volume. Each host creates its own file and HA to be able to determine whether an unresponsive host still has connection to a datastore, and simply checks whether the heartbeat region has been updated or not.
By default, an isolation response is triggered after 5 seconds for the master host and after approximately 30 seconds if the host was a slave in vSphere 5. This time difference occurs because of the fact that if the host was a slave, it would need to go through the election process to determine whether there are any other hosts that exist, or whether the master host is simply down. This election starts within 10 seconds after the slave host has lost its heartbeats. If there is no response for 15 seconds, the HA agent on this host elects itself as the master. The isolation response time can be increased using the das.config.fdm.isolationPolicyDelaySec parameter under Advanced Settings. This is, however, not recommended as it increases the downtime when a problem occurs.
If a host becomes a master in a cluster with more than one host and has no slaves, it continuously starts checking whether it's in the isolation mode or not. It keeps doing so until it becomes a master with slaves or connects to a master as a slave. At this point, the host will ping its isolation address to determine whether the management network is available again. By default, the isolation address is a gateway configured for the management network. This option can be changed using the das.isolationaddress[X] parameter under Advanced Settings. [X] takes values from 1 to 10 and allows configuration of multiple isolation addresses. Additionally, the das.usedefaultisolationaddress parameter can be used to indicate whether the default gateway address should be used as an isolation address or not. This parameter should be set to False if the default gateway is not configured to respond to the ICMP ping packets.
Generally, it's recommended to have one isolation address for each management network. If this network uses redundant paths, the isolation address should always be available under normal circumstances.
In certain cases, a host may be isolated, that is, not accessible via the management network but still able to receive election traffic. This host is called partitioned. Have a look at the following figure to gain more insight about this:
When multiple hosts are isolated but can still communicate with each other, it's called a network partition. This can happen for various reasons; one of them is when a cluster spans multiple sites over a metropolitan area network. This is called the stretched cluster configuration.
When a cluster partition occurs, one subset of hosts is able to communicate with the master while the other is not. Depending on the isolation response selected for VMs, they may be left running or restarted. When a network partition happens, the master election process is initiated within the subset of hosts that loses its connection to the master. This is done to make sure that the host failure or isolation results in appropriate action on the VMs. Therefore, a cluster will have multiple masters; each one in a different partition as long as the partition exists. Once the partition is resolved, the masters are able to communicate with each other and discover the multiplicity of master hosts. Each time this happens, one of them becomes a slave.
The hosts' HA state is reported by vCenter through the Summary tab for each host as shown in the following screenshot:
This is done under the Hosts tab for cluster objects as shown in the following screenshot:
Running (Master) indicates that HA is enabled and the host is a master host.
Connected (Slave) means that HA is enabled and the host is a slave host.
Only the running VMs are protected by HA. Therefore, the master host monitors the VM's state and once it changes from powered off to powered on, the master adds this VM to the list of protected machines.
Each VM's HA behavior can be adjusted under vSphere HA settings or in the Virtual Machine Options option found in the PRD-CLUSTER Settings page as shown in the following screenshot:
The restart priority setting determines which VMs will be restarted first after the host failure. The default setting is Medium. Depending on the applications running on a VM, it may need to be restarted before other VMs, for example, if it's a database, a DNS, or a DHCP server. It may be restarted after others if it's not a critical VM.
If you select Disabled, this VM will never be restarted if there is a host failure. In other words, HA will be disabled for this VM.
The isolation response setting defines HA actions against a VM if its host loses connection to the management network but is still running. The default setting is Leave powered on. To be able to understand why this setting is important, imagine the situation where a host loses connection to the management network and at the same time or shortly afterwards, to the storage network as well—a so-called split-brain situation.
In vSphere, only one host can have access to a VM at a time. For this purpose, the .vmdk file is locked and there is an additional .lck file present in the same folder where .vmdk file is stored. As HA is enabled, VMs will fail over to another host, however, their original instances will keep running on the old host. Once this host comes out of isolation, we will end up with two copies of VMs. Therefore, the isolated host will not have access to the .vmdk file as it's locked. In vCenter, however, this VM will look as if it is flipping between two hosts.
With the default settings, the original host is not able to reacquire the disk locks and will be querying the VM. HA will send a reply instead which allows the host to power off the second running copy.
If the Power Off option is selected for a VM under the isolation response settings, this VM will be immediately stopped when isolation occurs. This can cause inconsistency with the filesystem on a virtual drive. However, the advantage of this is that VM restart on another host will happen more quickly, thus reducing the downtime.
The Shut down option attempts to gracefully shut down a VM. By default, HA waits for 5 minutes for this to happen. When this time is over, if the VM is not off yet, it will be powered off. This timeout is controlled by the das.isolationshutdowntimeout parameter under the Advanced Settings option. VM must have VMware tools installed to be able to shut down gracefully. Otherwise, the shutdown option is equivalent to power off.
Under VM Monitoring, the monitoring settings of individual applications can be adjusted as shown in the following screenshot:
The default setting is Disabled. However, VM and Application Monitoring can be enabled so that if the VM heartbeat (VMware tool's heartbeat) or its application heartbeat is lost, the VM is restarted. To avoid false positives, the VM monitoring service also monitors VM's I/O activity. If a heartbeat is lost and there was no I/O activity (by default during the last 2 minutes), VM is considered as unresponsive. This feature allows you to power cycle nonresponsive VMs.
I/O interval can be changed under the advanced attribute settings (for more details, check the HA Advanced attributes table later in this section).
Monitoring sensitivity can be adjusted as well. Sensitivity means the time interval between the loss of heartbeats and restarting of the VM. The available options are listed in the table from the VMware documentation article available at http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.avail.doc_50%2FGUID-62B80D7A-C764-40CB-AE59-752DA6AD78E7.html.
To avoid repeating VM resets by default, they will be restarted only three times during the reset period. This can be changed in the Custom mode as shown in the following screenshot:
In order to be able to monitor applications within a VM, they need to support VMware application monitoring. Alternatively, you can download the appropriate SDK and set up customized heartbeats for the application that needs to be monitored.
Under Advanced Options, the following vSphere HA behaviors can be set. Some of them have already been mentioned in sections of this article.
The following screenshot shows the Advanced Options (vSphere HA) window where advanced HA options can be added and set to specific values:
vSphere HA advanced options are listed in the article from VMware documentation available at http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.avail.doc_50%2FGUID-E0161CB5-BD3F-425F-A7E0-B F83B005FECA.html. This mentioned article also lists the options that are not supported in vCenter 5. You will get an error message if you try to add one of them. Also, the options will be deleted after being upgraded from the previous versions.
Admission control ensures there are sufficient resources available to provide failover protection as and when VM resource reservations are kept.
Admission control is available for the following:
Admission control can only be disabled for vSphere HA. The following screenshot shows PRD-CLUSTER Settings with the option to disable admission control:
Examples of actions that may not be permitted because of insufficient resources are as follows:
There are three possible types of admission control policies available for HA configuration that are as follows:
VM |
CPU |
RAM |
VM1 |
4 GHz |
2 GB |
VM2 |
2 GHz |
4 GB |
VM3 |
1 GHz |
6 GB |
Host |
CPU |
RAM |
Slots |
Host1 |
4 GHz |
128 GB |
1 |
Host2 |
24 GHz |
6 GB |
1 |
Host3 |
8 GHz |
14 GB |
2 |
This option is probably not the best one for an environment that has VMs with significantly more of resources assigned than the rest of the VMs.
The Host failure cluster tolerates option can be used when all cluster hosts are sized pretty much equally. Otherwise, if you use this option, then excessive capacity is reserved such that the cluster tolerates the largest host failure. When this option is used, VM reservations should be kept similar across the cluster as well. Because vCenter uses the slot sizes model to calculate capacity, and the slot size is based on the largest reservation, having VMs with a large reservation will again result in additional unnecessary capacity being reserved.
CPU: (1-7/34)*100%=79%
RAM: (1-12/148)*100%=92%
With such different hosts from the example, the CPU and RAM capacity should be configured carefully to avoid a situation when, for example, the host with most amount of RAM fails and the other hosts are not able to accommodate all the VMs because of memory resources. Therefore, RAM should be configured at 87 percent based on the two smallest hosts (#2 and #3) and not 30% based on the number of hosts in the environment:
[1-(6+14)/148]*100%=87%
In other words, if the host with 128 GB fails, we need to make sure that the total resources needed by the VMs are less than the sum of 6 GB and 14 GB, which is only 13 percent of the total cluster's 148 GB. Therefore, we need to make sure that in all instances, the VMs use only 13 percent of the RAM or that the cluster has 87 percent of RAM that is free.
It is recommended to use the Percentage of cluster resources reserved option in most cases. This option offers more flexibility in terms of host and VM sizing than other options.
vSphere HA configuration files for each host are stored on the host's local storage and are protected by the filesystem permissions. These files are only available to the root user.
For security reasons, ESXi 5 hosts log HA activity only to syslog. Therefore, logs are placed at a location where syslog is configured to keep them. Log entries related to HA are prepended with fdm, which stands for fault domain manager. This is what the vSphere HA ESX service is called.
Older versions of ESXi write HA activity to fdm logfiles in /var/log/vmware/fdm stored on the local disk. There is also an option to enable syslog logging on these hosts. Older ESX hosts are able to save HA activity only in the fdm local logfile in /var/log/vmware/.
HA agent logging configuration also depends on the ESX host version. For ESXi 5 hosts, the logging options that can be configured via the Advanced Options tab under HA are listed in the article under the logging section available at http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2033250.
The das.config.log.maxFileNum option causes ESXi 5 hosts to maintain two copies of the logfiles: one is a file created by the Version 5 logging mechanism, and the other one is maintained by the pre-5.0 logging mechanism. After any of these options are changed, HA needs to be reconfigured.
The following table provides log capacity recommendations according to VMware for environments of different sizes based on the requirement to keep one week of history:
Size |
Minimum log capacity per host in MB |
40 VMs in total with 8 VMs per host |
4 |
375 VMs in total with 25 VMs per host |
35 |
1,280 VMs in total with 40 VMs per host |
120 |
3,000 VMs in total with 512 VMs per host |
300 |
These are just recommendations; additional capacity may be needed depending on the environment. Increasing the log capacity involves specifying the number of rotations together with the file size as well as making sure there is enough space on the storage resource where the logfiles are kept.
The vCenter server uses the vpxuser account to connect to the HA agents. When HA is enabled for the first time, vCenter creates this account with a random password and makes sure the password is changed periodically. The time period for a password change is controlled by the VirtualCenter.VimPasswordExpirationInDays parameter that can be set under the Advanced Settings option in vCenter.
All communication between vCenter and HA agents, as well as agent-to-agent traffic, is secured with SSL. Therefore, for vSphere HA, it's necessary that each host has verified SSL certificates. New certificates require HA to be reconfigured. It will also be reconfigured automatically if a host has been disconnected before the certificate is replaced.
SSL certificates are also used to verify election messages so if there is a rogue agent running, it will only be able to affect the host it's running on. This issue, if it occurs, is reported to the administrator.
HA uses TCP/8182 and UDP/8182 ports for communication between agents. These ports are opened and closed automatically by the host's firewall. This helps to ensure that these ports are open only when they are needed.
When vSphere HA restarts VMs on a different host after a failure, the main priority is the immediate availability of VMs. Based on CPU and memory reservations, HA determines which host to use to power the VMs on. This decision is based, of course, on the available capacity of the host. It's quite possible that after all the VMs have been restarted, some hosts become highly loaded while others are relatively lightly loaded.
DRS is the load balancing and failover solution that can be enabled in vCenter for better host resource management.
vSphere HA, together with DRS, is able to deliver automatic failover and load balancing solutions, which may result in a more balanced cluster. However, there are a few things to consider when it comes to using both features together.
In a cluster with DRS, HA, and the admission control enabled; VMs may not be automatically evacuated from a host entering the maintenance mode. This occurs because of resources reserved for VMs that need to be restarted. In this case, the administrator needs to migrate these VMs manually.
Some VMs may not fail over because of resource constraints. This can happen in one of the following cases:
HA only restarts a VM if there is a host failure. In other words, it will power on all the VMs that were running on a failed host placed on another member of the cluster. Therefore, even with HA enabled, there will still be a short downtime for VMs that are running on faulty hosts. In fast environments, however, VM reboot happens quickly. So if you are using some kind of monitoring system, it may not even trigger an alarm. Therefore, if a bunch of VMs have been rebooted unexpectedly, you know there was an issue with one of the hosts and can review the logs to find out what the issue was.
Of course, if you have set up vCenter notifications, you should get an alert.
If you need VMs to be up all the time even if the host goes down, there is another feature that can be enabled called Fault Tolerance.