Best-Practice Data Science
This chapter discusses a normative framework for creating value with data. Inspired by the Roman architect Vitruvius, the products of data science need to be useful, sound and aesthetic.
Data science is useful when the data is converted into information. This information increases the knowledge of the professionals who use it. This knowledge improves the reality from which the data was extracted. The relationship between reality and data is critical.
Data science is sound when it delivers valid and reliable results and can be reviewed by other experts.
Validity is the extent to which the data describes the aspect of reality it is presumed to represent. In physical measurements, this aspect is governed by physics and chemistry. In measures of the social world, validity is complicated, because we can only record the external states of a human being, and not their state of mind.
The reliability of data and its analysis relates to the accuracy of the information. In physical...