Chapter 3. ETL with Spark
So we have gone through the architecture of Spark, and have had some detailed level discussions around RDDs. By the end of Chapter 2, Transformations and Actions with Spark RDDs, we had focused on PairRDDs and some of the transformations.
This chapter focuses on doing ETL with Apache Spark. We'll cover the following topics, which hopefully will help you with taking the next step on Apache Spark:
- Understanding the ETL process
- Commonly supported file formats
- Commonly supported filesystems
- Working with NoSQL databases
Let's get started!