Transformations
We've used few transformation functions in the examples in this chapter, but I would like to share with you a list of the most commonly used transformation functions in Apache Spark. You can find a complete list of functions in the official documentation http://bit.ly/RDDTransformations.
Most Common Transformations | |
|
coalesce(numPartitions) |
|
repartition(numPartitions) |
|
repartitionAndSortWithinPartitions(partitioner) |
|
join(otherDataset, [numTasks]) |
|
cogroup(otherDataset, [numTasks]) |
|
cartesian(otherDataset) |
Map(func)
The map
transformation is the most commonly used and the simplest of transformations on an RDD. The map
transformation applies the function passed in the arguments to each of the elements of the source RDD. In the previous examples, we have seen the usage of map()
transformation where we have passed the split()
function...