Software-Defined Data Center
We covered how a VM differs drastically compared to a physical server. Now let's take a look at the big picture, which is at the data center level. A data center consists of three functions—compute, network, and storage. I use the term compute as we are entering the converged infrastructure era, where the server performs storage too and they are physically in one box. There is no more separation and we cannot say this is the boundary where the server stops and the storage starts.
VMware is moving to virtualize the network and storage functions as well, resulting in a data center that is fully virtualized and defined in the software. The software is the data center. We no longer prepare the architecture in the physical layer. The physical layer is just there to provide resources. These resources are not aware of one another. The stickiness is reduced and they become a commodity. In many cases, the hardware can even be replaced without incurring downtime to the VM.
The next diagram shows one possibility of a data center that is defined in the software. I have drawn the diagram to state a point, so don't take this as the best practice for SDDC architecture. Also, the technology is still evolving, so expect changes in the next several years. In the diagram, there are two physical data centers. Large enterprises will have more physical data centers. The physical data centers are completely independent. Personally, I believe this is a good thing. Ivan Pepelnjak, someone I respect highly on data center networking architecture, states that:
Interconnected things tend to fail at the same time
Each of these physical functions (compute, network, and storage) is supported, or shall I say instantiated, in the physical world, by the respective hardware vendors. For a server, you might have vendors (for example, Nutanix, HP, Lenovo, Dell, and so on) that you trust and know. I have drawn two vendors to show the message that they do not define the architecture. They are there to support the function of that layer (for example, Compute Function). So, you can have 10 vSphere clusters: 3 clusters could be Vendor A, and 7 clusters could be Vendor B.
The same approach is then implemented in Physical Data Center 2, but without the mindset that the data centers have to be of the same vendor. Take Storage Function, as an example. You might have Vendor A on data center 1, and Vendor B on data center 2. You are no longer bound by the hardware compatibility; storage array replication normally requires the same model and protocol. You can do this as the physical data centers are completely independent of each other. They are neither connected nor stretched. The replication is done at the hypervisor layer. vSphere 5.5 has built-in host-based replication via TCP/IP. It can replicate individual VMs, and provides finer granularity than LUN-based replication. Replication can be done independently from a storage protocol (FC, iSCSI, or NFS) and VMDK type (thick or thin). You might decide to keep the same storage vendor but that's your choice, not something forced upon you.
On top of these physical data centers, you can define and deploy your virtual data centers. A virtual data center is no longer contained in a single building bound by a physical boundary. Although, bandwidth and latency are still limiting factors, the main thing here is you can architect your physical data centers as one or more logical data centers. You should be able to automatically, with just one click in SRM 5.5, move thousands of servers from data center A to data center B; alternatively, you can perform DR from four branch sites to a common HQ data center.
You are not bound to have one virtual data center per site, although it is easier to map it one-on-one with the current release of vSphere. For example, it is easier if you just have one vCenter per physical data center.
The next screenshot shows what a vCenter looks like in vSphere 5.5, the foundation of vCloud Suite. VMware continues integrating and enhancing vCloud Suite, and I would not be surprised to see its capability widening in future releases.
I will zoom in to a part of the screenshot as it's rather small. The left part of the screenshot, shown next, shows that there are three vCenter Servers, and I've expanded each of them to show their data centers, clusters, hosts, and VMs:
From here, we can tell that we no longer need another inventory management software, as we can see all objects and their configurations and how they relate to one another. It is clear from here how many data centers, clusters, ESXi hosts, and VMs we have.
We also get more than static configuration information. Can you see what live or dynamic information is presented here? These are not the types of information you get from CMDB or the inventory management system.
You will notice from the preceding screenshot that I get warnings and alerts, so this is a live environment. I also get information on the capacity and health. At the corner of the screen, you can see the data center CPU, memory, storage capacity, and usage. In the vSphere Replication box, you can see the VM replication status. For example, you can see that it has 7 outgoing replications and 3 incoming replications. In the middle of the screen, you can see Health State, which, by the way, comes from vRealize Operations. In the Infrastructure Navigator box, you get to see what applications are running, such as Application Server and Database Server. This information also comes from vRealize Operations. So, many of the management functions are provided out of the box. These functions are an integral part of vCloud Suite.
As a virtualization engineer, I see a cluster as the smallest logical building block in vSphere. I treat it as one computer. You should also perform your capacity management at the cluster level and not at the host level. This is because a VM moves around within a cluster with DRS and Storage DRS. In the virtual data center, you think in terms of a cluster and not a server.
Let's take a look at the cluster called SDDC-Mgmt-Cluster, shown in the next screenshot. We can tell that it has 3 hosts, 24 processors (that's cores, not socket or threads), and 140 GB of RAM (about 4 GB is used by the three instances of VMkernel). We can also tell that it has EVC Mode enabled, and it is based on the Intel Nehalem generation. This means I can add an ESXi host running a newer Intel processor (for example, Westmere) live inside the cluster, and perform vMotion across the CPU generation. On the top-right corner, we can see the capacity used, just like we can see at the vCenter level. In a sense, we can drill down from the vCenter level to the cluster level.
We can also see that HA and DRS are turned on. DRS is set to fully automated, which is what I would recommend as you do not want to manually manage the ESXi host one by one. There is a whole book on vSphere Cluster, as there are many settings on this features. My favorite is by Duncan Epping and Frank Denneman, which is available at http://www.yellow-bricks.com/my-bookstore/.
The ramification of this is that the data center management software needs to understand vSphere well. It has to keep up with the enhancements in vSphere and vCloud Suite. A case in point: vSphere 5.5 in the Update 1 release added Virtual SAN, a software-defined storage integrated into vSphere.
Notice Health State. Again, this information comes from vRealize Operations. If you click on it, it will take you to a more detailed page, showing charts. If you drill down further, it will take you to vRealize Operations.
The Infrastructure Navigator box is useful so you know what applications are running in your cluster. For example, if you have a dedicated cluster for Microsoft SQL Server (as you want to optimize the license) and you see SQL in this cluster (which is not supposed to run the database), you know you need to move the VM. This is important because sometimes as an infrastructure team, you do not have access to go inside the VM. You do not know what's running on top of Windows or Linux.
We covered compute. Let's move on to network. The next screenshot shows a distributed virtual switch. As you can see, the distributed switch is an object at the data center level. So it extends across clusters. In some environments, this can result in a very large switch with more than 1,000 ports. In the physical world, this would be a huge switch indeed!
A VM is connected to either a standard switch or a distributed switch. It is not connected to the physical NIC in your ESXi host. The ESXi host physical NICs become the switch's uplinks instead, and generally you have 2 x 10 GE ports. This means that the traditional top-of-rack switch has been entirely virtualized. It runs completely as software, and the following screenshot is where you create, define, and manage it. This means the management software needs to understand the distributed vSwitch and its features. As you will see later, vRealize Operations understands virtual switches and treats networking as a first-class object.
The previous screenshot shows that the switch has six port groups and two uplinks. Let's drill down into one of the port groups, as shown in the next screenshot. Port group is a capability that is optional in physical switches, but mandatory in a virtual switch. It lets you group a number of switch ports and give it a common property. You can also set policies. As shown in the Policies box, there are many properties that you can set. Port group is essential in managing all the ports connected to the switch.
In the top-right corner, you see the CAPACITY information. So you know how many ports you configured and how many ports are used. This is where virtual networking is different to virtual compute and virtual storage. For compute and storage, you need to have the underlying physical resources to back it up. You cannot create a VM with a 32-core vCPU if the underlying ESXi has less than 32 physical threads. Virtual network is different. Network is an interconnection; it is not a "node"-like compute and storage. It is not backed by physical ports. You can increase the number of ports to basically any number you want. The entire switch lives on memory! You power off the ESXi and there is no more switch.
In the Infrastructure Navigator box, you will again see the list of applications. vRealize Operations is deeply embedded into vSphere, making you feel like it's a single application as it is a single pane of glass. In the past several releases of VMware products; they are becoming one integrated suite and this trend is set to continue.
Let's now move to storage. The next screenshot shows a vSphere 5.5 datastore cluster. The idea behind a datastore cluster is similar to that of a compute cluster. Let's use an example as it's easier to understand. Say you have a cluster of 8 ESXi hosts, with each host sporting 2 sockets, 24 cores, and 48 threads. In this cluster, you run 160 VMs, giving you a 20:1 consolidation ratio. This is reasonable from a performance management view as the entire cluster has 192 physical cores and 384 physical threads. Based on the general guidelines that Intel Hyper-Threading gives around a 50 percent performance boost, you can use 288 cores as the max thread count. This gives you around 1.8 cores per VM, which is reasonable as most VMs are 2 vCPU and have around 50 percent utilization. These 160 VMs are stored in 8 datastores, or around 20 VMs per datastore.
With the compute node, you need not worry about where a VM is running in that cluster. When you provision a new VM, you do not specify which host will run it. You let DRS decide. As the workload goes up and down, you do not want to manage the placement on an individual ESXi host for 160 VMs. You let DRS do the load balancing, and it will use vMotion on the VM automatically. You treat the entire cluster as if it is a single giant box.
With the storage node, you can do the same thing. When you provision a new VM, you do not specify a datastore for it. If you do want to specify it manually, you need to check which datastore has the most amount of space and the least amount of IOPS. The first piece of information is quite easy to check, but the second one is not. This is the first value of the datastore cluster. It picks a cluster based on both capacity and performance. The second value is based on the ongoing operation. As time passes, VM grows at different rates in terms of both capacity and IOPS. Storage DRS monitors this and makes recommendations for you. The major difference here is the amount of data to be migrated. In vMotion, we normally migrate somewhere between 1 GB to 10 GB of RAM, as the kernel only copies the used RAM (and not the configured RAM). In storage vMotion, we potentially copy 100 GB of data. This takes a lot longer and hence has a greater performance impact. As such, Storage DRS should be performed a lot less frequently, perhaps once a month.
Datastore cluster helps in capacity management, as you basically treat all the datastores as one. You can easily check key information about the datastore cluster, such as the number of VMs, total storage, capacity used, and largest free space you have.
As usual, vRealize Operations provides information about what applications are running in the datastore cluster. This is handy information in a large environment, where you have specific datastores for specific applications.
We covered all the three elements—compute, storage, and network. How are they related? The next screenshot shows the relationship of the key objects managed by vCenter.
It's handy information in a small environment. If you have a large environment, maps such as the one shown in the next screenshot really become much more complex! In this map, I only have 3 ESXi hosts and 7 datastores, and I have to hide some relationships already. Notice that I did not select the Host to VM and VM to datastore relationship options, because it got way too complicated when I did.
The point of sharing the screenshot is to share that you indeed have your data center in software with the following characteristics:
- You have your VM as the consumer. You can show both powered-on and powered-off VMs.
- You have your compute (ESXi), network (port group), and storage (datastore) as the provider. You can show the relationship between your compute to your network and storage.
- You have the information about the network, storage, and compute your VM is connected to.
Think about it. How difficult will it be to have this type of relationship mapped in the physical data center? I've personally heard comments from customers that they do not know exactly how many servers they have, which network they are connected to, and what applications run on that box. The powered-off server is even harder to find! Even if you can implement a data center management system, which can give you the map, one or two years later you cannot be sure that the map is up-to-date. The management system has to be embedded into the platform. In fact, it's the only point of entry to the virtual platform. It cannot be a separate, detached system.
The last point I'd like to bring up is that SDDC is a world in itself. It's not simply your data center virtualized. Look at the following table. It lists some of the objects in vSphere. I have not included NSX, Virtual SAN, or vRealize Suite objects here. These objects do not have their physical equivalent. If they do, they have different properties, generate different events, and are measured by different counters. Plus, all these objects have some relationship with one another. You need to look at vCloud Suite in its entirety to understand it well.
The downside of this SDDC is that the upgrade of this "giant machine" is a new project for IT. It has to be planned and implemented carefully because it is as good as upgrading the data center while servers, storage, and network are all still running. Using a physical world analogy, it's like renovating your home while living in it.