Spark
Spark (https://spark.apache.org) is a unified analytics engine for large-scale data processing. Spark started as a project by the University of California, Berkeley, in 2009, and moved to the Apache Software Foundation in 2013.
Spark was designed to tackle some problems with the Hadoop architecture when used for analysis, such as data streaming, SQL over files stored on HDFS and machine learning. It can distribute data over all computing nodes in a cluster in a way that decreases the latency of each computing step. Another Spark difference is its flexibility: there are interfaces for Java, Scala, SQL, R and Python, and libraries for different problems, such as MLlib for machine learning, GraphX for graph computation, and Spark Streaming, for streaming workloads.
Spark uses the worker abstraction, having a driver process that receives user input to start parallel executions, and worker processes that reside on the cluster nodes, executing tasks. It has a built-in cluster management tool...