Looking forward to the Design phase
Once you have set reasonable purposes and goals for your cluster project, the next phase involves designing the system that will achieve them. At a high level, this is little different from designing a system that is intended to host a single-purpose service in a non-virtualized environment. You first identify the expected load and then architect a solution that can comfortably bear it. You no doubt already have some idea of the volume of computing resources that will be demanded of your cluster. However, the nature of clustering does require some more understanding before you can begin outlining components to purchase.
Many of these concepts may seem obvious to you as a computing professional, but the early phases of a project will usually require the involvement and sometimes oversight from less technically proficient members of the organization. It is certainly not required that they become subject-matter experts, but they must be made aware of the general needs of the project so that they are not surprised when the requests for resources, time, and capital expenditures begin. Several items will need to have attention drawn to them in the early phases. Specific inclusion of those items in the project planning document is optional based on the needs of your organization and the overall size of your project. You might consider a Solution Summary section that briefly itemizes the components of the solution without providing any particular details. If your project is small enough or if there won't be many reviewers of the document itself, you may choose to skip including this section in favor of the more detailed list that will inevitably be included in the Design portion. However, the more simplistic layout may need to be built for presentations, and it can even be used as a basic checklist for the Design phase.
Host computers
A cluster involves multiple physical computer systems. As mentioned in the cloud discussion earlier, it's not absolutely required that each host be identical to the others, but it is certainly desirable. Virtual machines that move across differing hardware may suffer a noticeable performance degradation if the target doesn't have the same capabilities or configuration as the source. Where possible, these hosts should be purchased together prior to initial implementation. Adding nodes to a cluster requires more effort after that cluster has gone into production. Unlike a typical single-server physical deployment, it is common for the combined power of a cluster to provide significantly more computing resources than are actually required to provide the included services. This is because part of the purpose of a cluster is to provide failover capability.
Also, Hyper-V Server host by nature needs to run more than one operating system concurrently, so these systems may require more CPU cores and RAM than your organization is accustomed to purchasing for a single system. If possible, modify your organization's existing provisioning standards to accommodate the differences for virtualization hosts.
Storage
An element that clustering introduces is the need for shared storage. While it is technically possible to build a cluster that does not use shared storage, it is not practical. Out of the three main components of a virtual machine, the CPU threads and memory contents can only exist on one node at a time, but they can be rapidly transferred to another node. In the event of a host crash, these contents are irretrievably lost just as they would be if the machine were not virtualized. In a high availability solution, these are considered acceptable losses. However, the long-term data component, which includes configuration data about the virtual machine in addition to the contents of its virtual hard drives, is a protected resource that is expected to survive a host crash—just as it would be in a non-virtualized environment. If that data is kept on internal storage in a host that fails, there will be no way for another host to access it without substantial effort on the part of an administrator.
The files that comprise a highly available virtual machine must be placed in a location that all cluster nodes can access. There are some special-case uses in which only a subset of the nodes are allowed to access a particular storage location, but a virtual machine cannot be truly considered to be highly available unless it can run on more than one cluster node.
Cluster Shared Volumes
Shared storage involves both physical devices and logical components. The preferred way to logically establish shared storage for clustered Hyper-V Server computers is by using Cluster Shared Volumes (CSV). The name more or less explains what it does; it allows volumes to be shared across the nodes of a cluster. Contrast this to the traditional volume which can only be accessed by one computer at a time. In the term CSV, Volumes specifically refers to NTFS volumes. You cannot use any other format type (FAT, NFS, and so on) with a CSV (the new ReFS format is acceptable in 2012 R2, as will be discussed in Chapter 4, Storage Design).
In more technical terms, CSV is powered by a filter driver that a node uses to communicate with NTFS volumes that might also be accessed by other nodes simultaneously. The technical details of CSVs will be examined in much more depth in later chapters.
SMB shares
A powerful feature introduced with Windows Server 2012 is Version 3.0 of Microsoft's server message block (SMB) technology. Because it is typically used on file shares, SMB is usually thought of in terms of storage. In actuality, it is a networking protocol. Its applications to storage are why it is mentioned in this section. For one thing, Cluster Shared Volume communications between nodes are encapsulated in SMB. However, you can now create a regular SMB share on any computer running Windows Server 2012 or later and use it to host the files for a Hyper-V Server virtual machine. Hardware vendors are also working to design systems that provide SMB 3.0 shares. Many will use an embedded installation of Windows Storage Server; others will follow Microsoft's specification and design their own systems.
Mixing SMB 3.0 and CSV
You will cover the specific method(s) of provisioning and using storage during the Design phase, but the possibilities and applications need to be made clear as early as possible. Unless they're on a clustered file server, you cannot create a CSV on an SMB 3.0 share point, and creating an SMB 3.0 share on a CSV does not expose the existence of that CSV in a way that Hyper-V Server can properly utilize. However, a Hyper-V Server cluster can run some virtual machines from CSVs while running others on SMB 3.0 shares. The initial impact this has on planning is that if you have complex needs and/or a restrictive budget, there is no requirement to decide between a storage area network (SAN) or less expensive methods of storage. You can have both. If any of these concepts or terms are new to you, read through Chapter 4, Storage Design, before making any storage decisions.
The following image shows a sample concept diagram of a cluster that mixes storage connectivity methods:
Networking
The networking needs of a Hyper-V Server cluster node are substantially different from those of a standalone system. A cluster node running Hyper-V Server involves three distinct networking components.
- Management
- Cluster and Cluster Shared Volume communications
- Live Migration traffic
Management
Management traffic involves regular communications to and from the management operating system of the variety that any Windows Server system would use. Examples include connections for Remote Desktop Connection clients, remote management consoles, monitoring tools, and backups that operate within the context of the management operating system. This connection is used as the host's identifier within the cluster and will be the target for cluster management software. Usually, the events that will generate the most bandwidth on this connection are file transfers to and from the host (such as .ISO
files to be connected to virtual machines) and backup traffic moving from the hypervisor to a backup server on another computer.
Cluster and Cluster Shared Volumes
The individual nodes of a cluster need to communicate with each other directly, preferably over a network dedicated to inter-node communications. The traffic consists of "heartbeat" information in which the nodes continually verify that they can see each other. Information about cluster resources, specifically virtual machines in the case of a cluster of Hyper-V Server computers, is synchronized across this network.
Communications related to Cluster Shared Volumes also utilizes this network. In normal operations, this is nothing more than basic metadata information such as ownership changes of a CSV or a virtual machine. However, some conditions can trigger what is called Redirected Access Mode, in which all the disk operations for the virtual machines on a particular node involving one or more CSVs are routed through the node(s) that own the affected CSV(s). This mode and its triggers will be looked at in greater detail in later chapters. At this stage, the important information is that if you will be using CSVs, you need to prepare for the possibility that cluster communications may need to have access to a significant amount of bandwidth.
Live Migration
A Live Migration involves the transfer of the active state of a virtual machine from one node to another. There is a small amount of configuration data that goes along, but the vast majority of the information in this transfer is the active contents of the virtual machine's memory. The amount of bandwidth you make available for this network translates directly into how quickly these transfers occur. The considerations for this will be thoroughly examined later. For now, understand that this network needs access to a substantial amount of bandwidth.
Subnetting
Each of these traffic types must be isolated from the others on their own subnets with the possible exception of cluster communications. This is a requirement of Microsoft Failover Clustering and, for the most part, cannot be circumvented. In some organizations, this will involve calling upon a dedicated networking team to prepare the necessary resources for you. Until you enter the actual Design phase, you won't be able to tell them much beyond the fact that you'll need at least two, and probably more, subnets to satisfy the requirements. However, unless you intend to isolate your Hyper-V Server hosts and/or you expect your cluster to have enough nodes that it might overwhelm currently allocated ranges, the subnet that contains your management traffic can be an existing IP infrastructure. Depending on the capability of your networking equipment and organizational practices, you may also choose to place your IP networks into distinct virtual LANs (VLANs).
The VLAN is a networking concept that has been in widespread use for quite some time, and it is not related to hypervisors or virtual machines. Windows Server's networking stack and Hyper-V's virtual switch are fully capable of handling traffic in separate VLANs. This book will explain how to configure Hyper-V accordingly, but your network equipment will have its own configuration needs. Work with your networking team or provider if you need guidance.
Virtual machine traffic
A fourth traffic type you must design for is that used by the virtual machines. Unlike the traffic types mentioned previously, this is not a cluster-defined network. In fact, Microsoft Failover Clustering in 2012 is not at all aware of the existence of your virtual machine network setup. R2 adds visibility for protection purposes, but it is not a true cluster network. Virtual machine traffic is controlled entirely by Hyper-V Server via the virtual switch. It is recommended that you use at least a one gigabit network adapter for this role, but it is possible for it to share with a cluster role if necessary. If using gigabit adapters, Microsoft only supports this sharing with the management role and only in a particular configuration. The actual amount of bandwidth required will depend on how much your virtual machines need. You will revisit this during the Design phase.
Virtual machine traffic does not require a dedicated subnet. Any virtual machine can access any subnet or VLAN that you wish.
Storage traffic
iSCSI is a commonly used approach to providing access to shared storage for a Hyper-V Server cluster environment. If you're not familiar with the term, iSCSI is a method of encapsulating traditional Small Computer Systems Interface (SCSI) commands into IP packets. SCSI in this sense refers to a standardized command set used for communications with storage devices. If you will be using iSCSI, it is recommended that this traffic be given its own subnet. Doing so reduces the impact of broadcast traffic on I/O operations and provides a measure of security against intruders.
If your storage system employs multi-path or you have multiple storage devices available, you will occasionally see recommendations that you further divide the separate paths into their own subnets as well. Testing for the true impact of this setup has not produced conclusive results, so it is likely to require more effort than it's worth. Unless you have a very large iSCSI environment or a specific use case that clearly illustrates the rationale for multiple iSCSI networks, a single subnet should suffice.
SMB 3.0 traffic should also be given its own subnet. Like iSCSI, SMB 3.0 can take advantage of multiple network adapters. Unlike iSCSI, using multiple paths to SMB 3.0 storage requires one subnet per path.
Physical adapter considerations
It is recommended that you provide each traffic type with its own gigabit adapter. If necessary, it is possible for the roles to share fewer adapters, all the way down to a single gigabit network interface card. This can cause severe bottlenecks and Microsoft will only support such role-sharing in specific configurations. If you will be using ten-gigabit adapters, the recommendations are much more relaxed. These are important considerations early on as it's not uncommon for a Hyper-V Server host to have more than six network adapters. Many organizations are not accustomed to purchasing hardware with that sort of configuration, so this may require a break from standardized provisioning processes.
All physical adapters are not created equally. While the only base requirement is to use gigabit adapters, other features are available that can provide enhanced network performance. One of these features is VMQ (virtual machine queue), which allows a guest to bypass some of the hypervisor's processing for incoming traffic. More recent technologies that Hyper-V Server can take advantage of are remote direct memory access (RDMA) and single-root input/output virtualization (SR-IOV).
These technologies are becoming increasingly common, but they are currently only available on higher-end adapters. Chapter 6, Network Traffic Shaping and Performance Enhancements, is devoted to these and other advanced networking technologies.
Adapter teaming
Windows Server 2012 introduced the ability to form teams of network adapters natively within the operating system. In previous Windows versions, teaming required specific support from hardware manufacturers. It was usually not possible to create a single team across adapters of different hardware revisions or from different manufacturers. The quality of teaming could vary significantly from one driver set to the next. As a result, teams sometimes caused more problems than they solved. Microsoft official policy has always been to support Windows networking only when no third-party teaming solution is present.
With built-in support for adapter teaming, many new possibilities are available for Hyper-V Server cluster nodes. These will be discussed in great detail in later chapters. What is important to know now is that the technology is available and directly supported by Microsoft. One major misconception about this technology deals with bandwidth aggregation.
If you or other interested parties have particular expectations of this feature, you may benefit from reading ahead through Chapter 5, Network Design. In simple terms, the primary benefits of adapter teaming are load balancing and failover. Teaming also paves the way for converged fabric, which is also explained in Chapter 5, Network Design.
Active Directory
Microsoft Failover Clustering requires the presence of an Active Directory domain. The foundational reason is that the nodes of a cluster need to be able to trust that the other member computers are who they say they are, and the definitive tool that Microsoft technology relies on to make that determination is Active Directory. A Microsoft Failover Cluster also creates an Active Directory computer object that represents the entire cluster to other computers and some services. This object isn't quite as meaningful for a cluster of Hyper-V Server machines as it is for other clustered services, such as Microsoft SQL Server, but the object must exist. Other supporting technologies, such as Cluster Shared Volumes and SMB 3.0 shares that host virtual machines, are also dependent on Active Directory.
The requirement for Active Directory needs to be made obvious prior to the Design phase, as it may come as a surprise to some. Hyper-V Server itself does not require a domain, and as such it is not uncommon to find organizations that configure stand-alone Hyper-V Server hosts in workgroup mode to host publicly-accessible services in an untrusted perimeter or isolation network. This can be achieved through the natural isolation of virtual machines provided by Hyper-V Server and a better understanding of the virtual switch.
Virtualized domain controllers
Virtualizing domain controllers is an issue that is not without controversy. There are some very important pros and cons involved. Windows Server 2012 eliminated the more serious problems and planned placement of virtualized domain controllers can address most of the rest. It is not necessary that any decisions about this subject be made at this point of design; in fact, unless you don't have a domain environment yet, it can wait until after the virtualization project is complete. However, it should be brought up early on, so you may wish to make yourself aware of the challenges now. This topic will be fully explored in Chapter 9, Special Cases.
Supporting software
A Microsoft Hyper-V Server and a Microsoft Failover Cluster can both be managed using tools built into Windows Server and freely downloadable for Windows 8/8.1. However, there are many other applications available that go beyond what the basic tools can offer. You should begin looking into these products early on to determine what their feature sets are and if those features are of sufficient value to your organization to justify the added expenditure.
Management tools
Multiple tools exist that can aid you in maintaining and manipulating Hyper-V Server and Failover Clustering. The Remote Server Administration Tools, which are part of the previously mentioned tools built into Windows Server and downloadable for Windows 8/8.1, include Hyper-V Manager and Failover Cluster Manager. There are also a plethora of PowerShell commands available for managing these technologies. It is entirely possible to manage all aspects of even a large Hyper-V Server cluster using only these tools. However, the larger your cluster or the less time you have available, the more likely it is that you'll want to employ more powerful software assistants.
Foremost among these tools is Microsoft System Center Virtual Machine Manager (SCVMM). This tool adds a number of capabilities, especially if it is used in conjunction with the larger System Center family of products. Be aware that you must be using at least Service Pack 1 of the 2012 release of this product in order to manage a Hyper-V Server 2012 system and at least version 2012 R2 in order to manage Hyper-V Server 2012 R2.
Third-party management products exist for Hyper-V Server and the market continues to grow. Take some time to learn about them, and if possible, test them out.
To aid you in defining your criteria, there are some commonly-asked-for features that the free Hyper-V Manager and Failover Cluster Manager tools don't provide:
- Conversion of physical machines to virtual machines (often called P2V)
- Templates—stored copies of virtual machines that serve as basic pre-built images that can be deployed as needed
- Cloning of virtual machines
- Automated balancing of virtual machines across nodes
- Centralized repositories for CD and DVD image files that can be attached to virtual machines on any node on-demand
- "Self-service" capabilities in which non-administrators can deploy their own virtual machines as needed
- Extensions to the Hyper-V virtual switch
You don't necessarily need all of these features, nor is it imperative that a single product provide all of them. What's important is identifying the features that are meaningful to your organization, what package(s) provide those features, and, if necessary, what you are willing to pay for them.
Backup
Backup is a critical component of any major infrastructure deployment. Unfortunately, it is often not considered until a late stage of virtualization projects. Virtualization adds options that aren't available in physical deployment. Clustered virtual machines add challenges that aren't present in other implementations.
The topic of backup will be more thoroughly examined in Chapter 12, Backup and Disaster Recovery, but the basic discussion about it can't wait. Begin collecting the names of applications that are candidates. Windows Server, including Hyper-V Server, includes Windows Server Backup. This tool can be made to work with a cluster, but it is generally insufficient for all but the smallest deployments. Ensure that the products you select for consideration are certified for the backup method you intend to perform. If your plan will be to back up some or all virtual machines from within the hypervisor, your backup application will need to provide specific support for Hyper-V Server in a Microsoft Failover Clustering Environment.
Training
Depending upon the size of your deployment and your staff, you may need to consider seeking out training resources for your systems administrator(s). Hyper-V Server and Failover Clustering are not particularly difficult to use after a successful implementation, but the initial learning curve can be fairly steep. It is entirely possible to learn them both through a strictly hands-on approach with books such as this one. Microsoft provides a great deal of free introductory material through the Microsoft Virtual Academy at http://www.microsoft virtualacademy.com and in-depth documentation on TechNet at http://technet.microsoft.com. However, some of your staff may require formal classroom training or other methods of knowledge acquisition.