Overview
Apache Spark is a fast and general-purpose cluster computing system, initially developed as AMPLab / UC Berkeley as part of the Berkeley Data Analytics Stack (BDAS), http://en.wikipedia.org/wiki/UC_Berkeley. It provides high-level APIs for the following programming languages that make large, concurrent parallel jobs easy to write and deploy [17:01]:
Note
Link to latest information
The URLs as any reference to Apache Spark may change in future versions.
The core element of Spark is the resilient distributed dataset (RDD), which is a collection of elements partitioned across the nodes of a cluster and/or CPU cores of servers. An RDD can be created from a local data structure such as a list, array, or hash table, from the local filesystem or the Hadoop distributed file system (HDFS) [17:02].
The operations...