Overview of other data engineering problems for consideration
In this chapter, we will dive into some key data analysis needs. First, we’ll look into the data engineer’s statement of value in order to remind the data scientist of this key role. Then, we’ll address the capabilities needed in a modern data science/analysis workbench with features that include support for additive measures and non-additive measures at a petabyte in-memory scale. With this capability comes the need to support calculations within on-the-fly defined groups (aka calculation groups) and understand how this feature adds value beyond the well understood relational GROUP BY
operations. Then, we’ll get into notebooks and open source as well as commercial tools that compete to be the best pencil sharpener (tool) for the citizen data scientist or analyst workbench. We’ll also define the process of change management for data as a product. This is the ability to manage data restatements...