Chapter 11. Spark's Three Data Musketeers for Machine Learning - Perfect Together
In this chapter, we will cover the following recipes:
- Creating RDDs with Spark 2.0 using internal data sources
- Creating RDDs with Spark 2.0 using external data sources
- Transforming RDDs with Spark 2.0 using the filter() API
- Transforming RDDs with the super useful flatMap() API
- Transforming RDDs with set operation APIs
- RDD transformation/aggregation with groupBy() and reduceByKey()
- Transforming RDDs with the zip() API
- Join transformation with paired key-value RDDs
- Reduce and grouping transformation with paired key-value RDDs
- Creating DataFrames from Scala data structures
- Operating on DataFrames programmatically without SQL
- Loading DataFrames and setup from an external source
- Using DataFrames with standard SQL language - SparkSQL
- Working with the Dataset API using a Scala sequence
- Creating and using Datasets from RDDs and back again
- Working with JSON using the Dataset API and SQL together
- Functional programming with the Dataset...