17.1 Overall data wrangling
The applications and notebooks are designed around the following multi-stage architecture:
Data acquisition
Inspection of data
Cleaning data; this includes validating, converting, standardizing, and saving intermediate results
Summarizing, and the start of modeling data
Creating deeper analysis and more sophisticated statistical models
The stages fit together as shown in Figure 17.1.
The last step in this pipeline isn’t — of course — final. In many cases, the project evolves from exploration to monitoring and maintenance. There will be a long tail where the model continues to be confirmed. Some enterprise management oversight is an essential part of this ongoing confirmation.
In some cases, the long tail is interrupted by a change. This may be reflected by a model’s inaccuracy. There may be a failure to pass basic statistical tests. Uncovering the change and the reasons for change is...