Transforming RDDs with set operation APIs
In this recipe, we explore set operations on RDDs, such as intersection()
, union()
, subtract(),
and distinct()
and Cartesian()
. Let's implement the usual set operations in a distributed manner.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- Set up the package location where the program will reside
package spark.ml.cookbook.chapter3
- Import the necessary packages
import breeze.numerics.pow import org.apache.spark.sql.SparkSession import Array._
- Import the packages for setting up the logging level for
log4j
. This step is optional, but we highly recommend it (change the level appropriately as you move through the development cycle).
import org.apache.log4j.Logger import org.apache.log4j.Level
- Set up the logging level to warning and error to cut down on output. See the previous step for package requirements.
Logger.getLogger("org").setLevel(Level.ERROR) Logger.getLogger("akka").setLevel...