Chapter 3
Project 1.1: Data Acquisition Base Application
The beginning of the data pipeline is acquiring the raw data from various sources. This chapter has a single project to create a command-line application (CLI) that extracts relevant data from files in CSV format. This initial application will restructure the raw data into a more useful form. Later projects (starting in Chapter 9, Project 3.1: Data Cleaning Base Application) will add features for cleaning and validating the data.
This chapter’s project covers the following essential skills:
Application design in general. This includes an object-oriented design and the SOLID design principles, as well as functional design.
A few CSV file processing techniques. This is a large subject area, and the project focuses on restructuring source data into a more usable form.
CLI application construction.
Creating acceptance tests using the Gherkin language and behave step definitions.
Creating unit tests with mock objects...