Interacting with external systems
Scalding allows us to build rich pipelines that read data from one or more sources, perform data transformations, and store results into one or more sinks. The sources and the sinks are called taps.
With Scalding, we can tap into the HDFS filesystem. A characteristic of HDFS is that it does not allow appending to files. Once a file is closed, it is immutable and can only be changed by writing a new copy with a different filename. This style of file access fits nicely with MapReduce and batch processing jobs.
There are, however, use cases where data changes very frequently, or fast response times are required for real-time applications. The use cases fit nicely with in-memory systems. Fortunately, Scalding can tap to multiple external data stores, and thus, elaborate pipelines can be achieved:
Scalding supports interaction with SQL, NoSQL, and in-memory systems either through external libraries or by using wrappers over Cascading libraries. Moreover, with Scalding...