Data Transformation and Data Manipulation with Apache Spark
Apache Spark is a powerful distributed computing framework that can handle large-scale data processing tasks. One of the most common tasks when working with data is loading it from various sources and writing it into various formats. In this hands-on chapter, you will gain a comprehensive understanding of how to transform and manipulate data using Apache Spark.
In this chapter, we’re going to cover the following main recipes:
- Applying basic transformations to data with Apache Spark
- Filtering data with Apache Spark
- Performing joins with Apache Spark
- Performing aggregations with Apache Spark
- Using window functions with Apache Spark
- Writing custom UDFs in Apache Spark
- Handling null values with Apache Spark
By the end of this chapter, you will have learned how to use Apache Spark to perform various data manipulation tasks such as applying basic transformations, filtering your data...