Apache Spark is an open source parallel processing framework that supports in-memory processing to boost the performance of big data analytic applications. The Apache Spark cluster on HDInsight is compatible with Azure Storage (WASB), as well as Azure Data Lake Store.
When the developer creates a Spark cluster on HDInsight, the Azure compute resources are already created with Spark installed and configured. It only takes about 10 minutes to create a Spark cluster in HDInsight. The data to be processed is stored in Azure Storage or Azure Data Lake Storage.
Apache Spark provides primitives for in-memory cluster computing, which means that it is the perfect partner for HDInsight. An Apache Spark job can load and cache data into memory and query it repeatedly, which means that it produces results much more quickly than disk-based systems. In addition to this, Apache...