Spark is a powerful, open source, general-purpose, unified cluster-computing analytics framework for large-scale data-processing. It's known for high performance, in-memory processing with an efficient engine and query optimizer. The four most widely-used interpreters for Spark are Python, Scala, Java, and R, including their interactive CLI. Spark is built on a foundation of Resilient Distributed Dataset (RDD) spread across the cluster of nodes. This eliminates computational limitations due to a cap of maximum resources that can be on a single machine, theoretically making it an infinitely scalable system. With all this, it is no surprise that it is the largest open source project in the data-processing community. Refer to Apache Spark docs for further information at Spark-Spark Overview: http://spark.apache.org/docs/2.3.1/.




















































