Understanding the core capabilities of Scalding
Scalding provides a rich set of core operations to perform data transformations. Map-like operations apply a function to each tuple in the pipe. Join operations can join data from multiple pipes. Pipe operations allow us to concatenate or debug pipes. Grouping/Reducing operations group related data together. Also, for data that has been grouped, there is a rich set of group operations.
Map-like operations
These operations are internally translated into map phases of MapReduce and apply a function to every row of data. The syntax of the map operation is:
pipe.map(existingFields -> additionalFields) { function }
The map
operation uses some of the existing fields of a pipe as input and creates a pipe with additional fields by applying a function to the elements of the input. In the following example, a new field 'priceWithVAT
is introduced:
pipe.map('price -> 'priceWithVAT) { price: Double => price*1.20 }
Operations can be executed...