Black box testing
During test-driven development, we retain an internal perspective of the system. We identify all possible paths and exercise them through test case inputs to validate the expected output. However, using only valid input is not sufficient, especially when implementing MapReduce applications that execute against possibly billions of lines of data. As we cannot generate all possible cases of invalid input, we look at techniques that increase the data coverage of tests.
Taking a step back, the development lifecycle begins with data exploration followed by the algorithm design. Having a data scientist performing these tasks in a non-scalable development language such as R or Python is the basis of black box testing. Data scientists use multiple tools to extract meaning, insights, and ultimately, value from data. These tools provide powerful capabilities and rich visualizations that enable them to quickly conclude into mathematical models. The drawback is that the resulting implementation...