Summary
A modern data format or platform should not only be able to provide for the simple and obvious data flow paths, but also provide strategies for the not-so-straightforward but real-world scenarios that need to be tackled and tamed in production. In this chapter, we explored many scenarios around insufficient and inadequate data and looked at strategies to detect and overcome them. We reinforced the fact that the quality of insights can be controlled by fixing the quality of data instead of over-emphasizing the algorithms that are used to produce the insights. This is because, after a certain stage, it is a case of diminishing returns. However, understanding inherent data issues and fixing them produces a bigger return on the analytic investment. Delta's ACID transaction capabilities, together with its ability to make fine-grained updates, deletes, and merges, allow us to make fixes to the data easily. Everything changes, which means that data patterns, schemas, and drift...