Development lifecycle with testing strategy
The testing strategy described here is deeply intertwined with the software development lifecycle we follow. For data processing applications, everything starts with a data science phase, where we perform two tasks:
Data exploration: Analysis of the format, frequency of arrival, and contents of the data
Whiteboard design: Definition of the processing algorithm and the mathematical models to be used to generate features
These tasks are followed by two development tasks, which are:
TDD implementation: Conversion of the algorithm into a scalable MapReduce application using Scalding
Production deployment and monitoring: Execution, performance enhancement, and monitoring of the MapReduce job