MapReduce testing challenges
MapReduce applications process large amounts of data in order to infer and extract information. This causes the following:
A long feedback cycle from the execution to the validation of results
Difficulty in finding what data to use as mock data to validate results
Another set of problems is related to the logical complexity of operations. When used for business intelligence, for example, a MapReduce application is responsible for applying a possibly complex mathematical model to vast amounts of data.
Often, the computation complexity lies in the logical and mathematical concepts behind every step, similar to what happens in the development of cryptographic applications. Thus, we need to approach design and testing at a higher level.
Due to the complexity, it is difficult to specify the expected outcome of the computation in a test. This is why we need to focus on the testability of the components participating in the computation.