Apache Spark - evolution
It is interesting to trace the evolution of Apache Spark from an abstract perspective. Spark started out as a fast engine for big data processing-fast to run the code and write code as well. The original value proposition for Spark was that it offered faster in-memory computation graphs with compatibility with the Hadoop ecosystem, plus interesting and very usable APIs in Scala, Java, and Python. RDDs ruled the world. The focus was on iterative and interactive apps that operated on data multiple times, which was not a good use case for Hadoop.
The evolution didn't stop there. As Matei pointed out in his talk at MIT, users wanted more, and the Spark programming model evolved to include the following functionalities:
More complex, multi-pass analytics (for example, ML pipelines and graph)
More interactive ad-hoc queries
More real-time stream processing
More parallel machine learning algorithms beyond the basic RDDs
More types of data sources as input and output
More integration...