Summary
In this chapter, we demonstrated using Spark with various data sources and data formats. We used Spark to work with a relational database (MySQL), NoSQL database (MongoDB), semistructured data (JSON), and data storage formats commonly used in the Hadoop ecosystem (Avro and Parquet). This sets you up very nicely for the more advanced Spark application-oriented chapters to follow.
In the next chapter, we will shift our focus from the mechanics of working with Spark to how Spark SQL can be used to explore data, perform data quality checks, and visualize data.