HDInsight - a multi-node Hadoop cluster on Azure
In Online Chapter , Pushing R Further (https://www.packtpub.com/sites/default/files/downloads/5396_6457OS_PushingRFurther.pdf), we briefly introduced you to HDInsight-a fully-managed Apache Hadoop service that comes as part of the Microsoft Azure platform and is specifically designed for heavy data crunching. In this section, we will deploy a multi-node HDInsight cluster with R and RStudio Server installed and will perform a number of MapReduce jobs on smart electricity meter readings (~414,000,000 cases, four variables, ~12 GB in size) of the Energy Demand Research Project available to download from UK Data Service's online Discover catalog at https://discover.ukdataservice.ac.uk/catalogue/?sn=7591 . But before we can tap into the actual data crunching, we need to set up and prepare an HDInsight cluster to process our data. The configuration of HDInsight is not the simplest task to accomplish, especially if we need...