Installing Hadoop plus Spark clusters
Before installing Hadoop and Spark, let's understand the versions of Hadoop and Spark. Spark is offered as a service in all three popular Hadoop distributions from Cloudera, Hortonworks, and MapR. The current Hadoop and Spark versions are 2.7.2 and 2.0 respectively as of writing this book. However, Hadoop distributions might have a lower version of Spark as Hadoop and Spark release cycles do not coincide.
For the upcoming chapters' practical exercises, let's use one of the free virtual machines (VM) from Cloudera, Hortonworks, and MapR, or use an open source version of Apache Spark. These VMs makes it easy to get started with Spark and Hadoop. The same exercises can be run on bigger clusters as well.
The prerequisites to use virtual machines on your laptop are as follows:
- RAM of 8 GB and above
- At least two virtual CPUs
- The latest VMWare Player or Oracle VirtualBox must be installed for Windows or Linux OS
- The latest Oracle VirtualBox or VMWare...