Summary
In this chapter, we learned how distributed data processing works on Spark. You now understand how Spark uses Scala code encapsulated in a Spark application to break down datasets into pieces that are run on executors on a Spark cluster. You have created a simple Spark application that uses a SparkSession to interact with the Spark APIs to manipulate data. You now have the basics to move on to more challenging topics such as data ingestion, transforming data, and loading that data into target sources.
In the next chapter, we are going to look at various database operations starting with Spark JDBC API and work our way through building a small Database API of our own.