Scratching the surface of Big Data
Among the recent achievements and trends towards better analysis of data and services lies the Big Data movement. In particular, the Hadoop framework has established some kind of ad hoc standard "for the distributed processing of large datasets across clusters of computers using simple programming models". In addition to a distributed file system called HDFS optimized for high throughput to access data, Hadoop offers MapReduce facilities for processing large datasets in parallel. As setting up and running Hadoop is not always considered a simple task, some other frameworks have been developed on top of Hadoop as a means to simplify the definition of Hadoop jobs. In Java, the Cascading framework is a layer on top of Hadoop that provides a convenient API to facilitate creation of MapReduce jobs. In Scala, the Scalding framework has been developed to further enhance the cascading API by utilizing the concise and expressive Scala syntax, as we can observe by...