Working with Apache Spark
According to spark.apache.org, Spark is described as “a unified analytics engine for large-scale data processing. It provides high-level APIs in Java, Scala, Python, R, and an optimized engine.” Spark can be used for data engineering, machine learning (ML), and data science. Our focus will be on how it can be used for data engineering in Scala.
Spark is built and designed to process vast amounts of data, which is accomplished by making the compute used by Spark easily scalable and distributable. A Spark application is written by leveraging one of the Spark APIs that we will cover later in the chapter. For now, let’s take a look at how Spark applications work.