9.5 Extras
Here are some ideas for you to add to this project.
9.5.1 Create an output file with rejected samples
In Error reports we suggested there are times when it’s appropriate to create a file of rejected samples. For the examples in this book — many of which are drawn from well-curated, carefully managed data sets — it can feel a bit odd to design an application that will reject data.
For enterprise applications, data rejection is a common need.
It can help to look at a data set like this: https://datahub.io/core/co2-ppm. This contains data same with measurements of CO2 levels measures with units of ppm, parts per million.
This has some samples with an invalid number of days in the month. It has some samples where a monthly CO2 level wasn’t recorded.
It can be insightful to use a rejection file to divide this data set into clearly usable records, and records that are not as clearly usable.
The output will not reflect the analysis model. These objects...