Spark development status
Apache Spark has become the most currently active project in the Hadoop ecosystem in terms of the number of contributors by the end of 2015. Having started as a research project at UC Berkeley AMPLAB in 2009, Spark is still relatively young when compared to projects such as Apache Hadoop and is still in active development. There were three releases in the year 2015, from 1.3 through 1.5, packed with features such as DataFrames API, SparkR, and Project Tungsten respectively. Version 1.6 was released in early 2016 and included the new Dataset API and expansion of data science functionality. Spark 2.0 was released in July 2016, and this being a major release has a lot of new features and enhancements that deserve a section of their own.
Spark 2.0's features and enhancements
Apache Spark 2.0 included three major new features and several other performance improvements and under-the-hood changes. This section attempts to give a high-level overview yet step into...