Operations in Spark
RDDs support two types of operations:
Transformations
Actions
Transformations
The transformation operation performs some functions and creates another dataset. Transformations are processed in the lazy mode and only those transformations that are needed in the end result are processed. If any transformation is found unnecessary, then Spark ignores it, and this improves the efficiency.
Transformations, which are available and mentioned in Spark Apache docs at https://spark.apache.org/docs/latest/programming-guide.html#transformations, are as follows:
Transformation |
Meaning |
---|---|
|
Return a new distributed dataset formed by passing each element of the source through a function |
|
Return a new dataset formed by selecting those elements of the source on which |
|
Similar to map, but each input item can be mapped to 0 or more output items (so |
|