A simple example
To make clear that Scalding operations can be chained together to implement a complete pipeline look at the following example:
Tsv(args("input"), ('kid,'age,'fruits)) .read .flatMap('fruits -> 'fruit) { text : String => text.split(",") } .project('kid, 'fruit) .write(Tsv("results.tsv"))
The same example can be expressed in a number of pipes, where we assemble and control how each pipe connects to another:
val logs = Tsv(args("logfiles"), LogsOperations.schema ) .read .extractSomeUserInfo val customers = Tsv(args("cust_log"),COperations.schema).read .extractSomeCustomerInfo val joined = logs.joinWithSmaller(customers, 'user) val result = joined.filter( .write(Tsv(args("output")))
The preceding code allows us to whiteboard our designs for processing data before implementing a data processing flow. Refer to the pipeline definition of the first chapter to see how...