What is a virtual machine?
A virtual machine is a software construct that acts as a container for installing and running conventional operating systems on a server hardware managed by a hypervisor. It is an isolation boundary between the operating systems running on the shared hardware.
An operating system running on a virtual machine is completely unaware of the fact that it is indeed running on a virtual machine and resources assigned to it are also shared among other virtual machines. It assumes ownership of every resource that is assigned to it. Managing the sharing of resources among virtual machines is the duty of the hypervisor. The performance of the virtual machine is dependent on the hypervisor's ability to manage the shared resources.
When a virtual machine is created, it is assigned resources such as the CPU, memory, network interface, and storage. These resources are slices from a larger pool of resources that the server hardware can provide.
What makes up a virtual machine?
Now that we know the purpose of virtual machines, it is important to understand what components make up a virtual machine. Much like a physical machine, a virtual machine also has different components required for it to host a conventional operating system. The only difference being that the components and devices that become part of a virtual machine are behind an abstraction layer and hence don't have direct access to the hardware. Instead, every component such as the CPU, memory, and hard disks are slices from the physical server resources available. The operating system running on the virtual machine has an impression that it is running on physical hardware; indeed it is, but only the portion of the resources assigned to the virtual machine are exposed to the operating system:
Virtual Machine Monitor
From the previous sections, we have a brief idea as to what components make up a virtual machine. We know that it is an isolation container to run an operating system and its code without intervening with any of the other operating systems running on the same server hardware.
However, what enables this isolation? Who manages the resources for each of the virtual machines? You might already have an answer in mind, the VMKernel. Of course, it is the VMKernel, but VMKernel has several subfunctions. The kernel component that enables the concept of a virtual machine is called the Virtual Machine Monitor (VMM). Every virtual machine has an associated VMM providing virtual BIOS, virtual memory management, and other virtual devices.
The VMM has the following functions:
- Processor virtualization
- Memory virtualization
- I/O virtualization
Processor virtualization
Every x86 operating system is coded to run directly on hardware (bare metal), which means that the operating system will run in the ring with the highest privilege, Ring 0:
Anything that runs at Ring 0 will have direct access to the x86 processor hardware. Now, the challenge is the placement of the VMM. Much like an x86 operating system kernel, the VMM also needs to run at a privilege level that has direct access to the processor hardware. VMware achieved full virtualization by using BT and DE techniques or Hardware-assisted Virtualization.
Binary Translation (BT) and Direct Execution (DE)
Binary Translation (BT) translates the privileged instructions from the guest operating system and then executes it on the processor.
Every operating system has two types of instructions-normal instructions such as arithmetic instructions and privileged instructions such as initiating an I/O or system calls. System calls are nothing but a method to call a privileged instruction, which is hidden from the user mode.
When executing a user's program or application code, the processor goes about doing its job by executing the normal instructions in the user mode (Ring 1, Ring 2, and Ring 3).
During the execution, if the processor encounters a privileged instruction such as initiating an I/O or a system call, it generates a trap indicating an exception and would need to switch to the kernel mode. Switching to kernel mode is nothing but handing over the execution to the operating system's kernel running at Ring 0. A kernel that runs at Ring 0 can execute every machine instruction and reference every memory location.
Note
What is a trap?
A trap is generated by the CPU indicating that it has encountered a condition which it cannot handle and requires assistance from the operating system. Traps are used to invoke a system call.
Since x86 wasn't designed with virtualization in mind, not every instruction will have a corresponding trap facility. A trap is an operating system functionality that captures an exception and passes the control over to the operating system kernel, to be executed at Ring 0.
Full virtualization using BT and DE requires the VMM to run at Ring 0 and the guest operating system at Ring 1:
Since the x86 operating systems are not written to run at Ring 1, every privileged instruction that is handed over to it will now have to be translated and executed by the VMM, running at Ring 0.
The dilemma here is that not every x86 OS instruction will have a trap facility. This is where binary translation does its job. It doesn't wait for the processor to encounter an exception and generate a trap. Instead, it captures and reviews the instructions. On encountering an exception, it emulates a trap and takes control over the execution of that instruction.
Direct Execution (DE) is used to send the user mode instructions directly to the processor. Although the guest OS is now placed at Ring 1, it is still at that level with a much higher privilege than the user mode instructions. Hence there is no need to translate the user mode instructions, rather they can be sent directly to the processor.
Hardware-assisted Virtualization
Both Intel and AMD have added enhancements to their processor families to assist virtualization:
- Intel VT-x
- AMD-V
These enhancements allow VMM to run in a new higher-privileged mode than Ring 0.
With Hardware-assisted Virtualization, privileged and sensitive instructions encountered can now be directly send to the VMM. Intel VT-x or AMD-V features should be enabled in BIOS of an ESXi host, to be able to run 64-bit virtual machines on it.
Memory virtualization
Like with the processor resources, the server's memory resource should also be shared among the virtual machines.
The processor has a mechanism to access every memory bit on a memory module by addressing those memory locations using physical addresses. The operating system maintains another contiguous address space called the virtual addresses for the processes that run on them. Every time a process tries to access memory, it uses the virtual address for that memory location. The operating system will then have to translate the virtual address to a physical address:
Now, when we throw a virtual machine into the mix, things take a different turn. All conventional operating systems that will be installed on a virtual machine have a memory management technique similar to what was alluded to in the previous paragraph. But since the whole idea behind virtualization is to let multiple such virtual machines, there has to be a mechanism to manage physical memory access or allocation to these virtual machines. On an ESXi host, the VMKernel does all the resource management. In this case, it has to find a way to manage the physical memory. It does so by adding another memory management layer called the machine address space:
Now, when a process running inside of a guest operating system tries to access a memory location, it uses the virtual address space to do so. The virtual address requested will then have to translate to a physical address as seen by the operating system. The operating system will then have to translate the physical address to a machine address. The machine address eventually hits the physical memory. If this procedure were to be followed for every memory access, it would add a considerable overhead. Memory virtualization addresses this problem, by providing a mechanism to directly map the guest operating system's virtual address space to the machine address space by maintaining Shadow page tables.
Hardware-assisted memory virtualization eliminates the need for Shadow page tables by providing a mechanism to map the guest operating system's physical address space to the VMKernel machine address space.
Hardware-assisted memory virtualization technologies
The following are the examples of Hardware-assisted memory virtualization technologies:
- Intel's Extended Page Tables (EPT).
- AMD's Rapid Virtualization Index (RVI) or Nested Page Tables (NPT). Both RVI and NPT are different names for the same AMD MMU virtualization technology.
Note
For more information on how hardware-assisted memory virtualization works refer to the Performance Best Practices for vSphere 5.5: http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.5.pdf
I/O virtualization
I/O devices such as physical network interface cards and SCSI controllers will have to be made available to the virtual machines. But it wouldn't make sense if we allowed a virtual machine to own or control a device. If done so, it wouldn't allow other virtual machines to use the same resource. So, there is a compelling reason to virtualize I/O resources as well.
I/O virtualization is achieved by presenting emulated virtual devices or paravirtualized devices to the virtual machines. For emulated devices like that of an e1000 virtual network interface card, the guest operating system needs to have the required driver. For paravirtualized devices such as the VMXNET series of network interface cards you will need drivers supplied with VMware Tools. The driver corresponding to a device will interact with the I/O virtualization stack of VMkernel.