In this chapter, we discussed the internals of Apache Spark, what RDDs are, DAGs and lineages of RDDs, Transformations, and Actions. We also looked at various deployment modes of Apache Spark using standalone, YARN, and Mesos deployments. We also did a local install on our local machine and then looked at Spark shell and how it can be used to interact with Spark.
In addition, we also looked at loading data into RDDs and saving RDDs to external systems as well as the secret sauce of Spark's phenomenal performance, the caching functionality, and how we can use memory and/or disk to optimize the performance.
In the next chapter, we will dig deeper into RDD API and how it all works in Chapter 7, Special RDD Operations.