Putting theory into practice
In order to drill deep into the stored data, Hazelcast unsurprisingly provides us with an implementation of MapReduce, allowing us to perform analytical and aggregation queries directly within the cluster. Thus, we avoid the impractical need to load all of our data on a single node to gather the results.
As our jobs can be rather complex and may take time to run, we first need to obtain a JobTracker
object to create job tasks, as well as track and monitor their execution across the cluster. In order to create a Job
, we will need to feed data to it. As MapReduce works with data pairs, the KeyValueSource
wrapper class is used to provide a common input format. We can either use one of the convenient helper methods that it provides to transform a number of standard distributed collections. Also, we can build our own implementation, if required. This allows us to wire up other custom data sources, perhaps even beyond what is stored within Hazelcast.
Next, we will need...