Understanding Data Transformation
One of the main jobs of any data engineer is to transform data in some way to make it usable for Business Intelligence (BI) applications or for data scientists or analysts. In Chapter 3, you learned the basics of a Spark application and how to ingest data.
Now, in this chapter, we are going to dive a bit deeper and look at some advanced topics that are essential for any data engineer to understand when using Spark to build data pipelines.
Here is a list of them:
- Understanding the difference between transformations and actions
- Learning how to aggregate, group, and join data
- Leveraging advanced window functions
- Working with complex dataset types