In this chapter, you will learn to leverage shared storage in modern big data platforms. We'll evaluate current in-memory big data apps on vSphere virtualization platform. The in-memory feature of these platforms makes them less dependent on I/O and storage protocols. We will go through with administrator productivity and his control while creation of a Hadoop cluster and show the use of a Hadoop management tool for installation of software onto virtual machines. Further, we will learn about the ability to scale in and out, such that any workloads on the platform can expand to utilize all available cluster resources by pooling of resources to be shared by multiple virtual Hadoop clusters, resulting in higher average resource utilization.
We will cover the following topics in detail:
- Big data infrastructure
- Open source software