Join transformation with paired key-value RDDs
In this recipe, we introduce the KeyValueRDD
pair RDD and the supporting join operations such as join()
, leftOuterJoin
and rightOuterJoin()
, and fullOuterJoin()
as an alternative to the more traditional and more expensive set operations available via the set operation API, such as intersection()
, union()
, subtraction()
, distinct()
, cartesian()
, and so on.
We'll demonstrate join()
, leftOuterJoin
and rightOuterJoin()
, and fullOuterJoin()
, to explain the power and flexibility of key-value pair RDDs.
println("Full Joined RDD = ") val fullJoinedRDD = keyValueRDD.fullOuterJoin(keyValueCity2RDD) fullJoinedRDD.collect().foreach(println(_))
How to do it...
- Set up the data structures and RDD for the example:
val keyValuePairs = List(("north",1),("south",2),("east",3),("west",4)) val keyValueCity1 = List(("north","Madison"),("south","Miami"),("east","NYC"),("west","SanJose")) val keyValueCity2 = List(("north","Madison"),("west","SanJose"))
- Turn the List...