How is testing done differently for big data/Hadoop applications?
How has testing changed with evolution of big data (as compared to testing enterprise data warehouse)?
One of the angle to it is the increasing use of big data over cloud, resulting in increased convergence of Analytics and Cloud.
I would like to offer a viewpoint based on three criteria – data, platform, and infrastructure and validation tools:
- Software and data: Big data applications work with unstructured/semi-structured data (dynamic schema) and compared to static schema with which EDW applications function. Hence, while EDW applications can do with testing based on sampling, big data applications need to test the population. Testing the data for volume, variety, and velocity here means testing for semantics, visualization, and real-time availability of data, respectively.
- Platform: Since big data applications are hosted on cloud (platform as a service), the applications need to be tested for ability of distributed...