Collecting Data
The ultimate purpose of collecting data is to enable decisions that improve reality or reduce the risk of future adverse impacts. Chapter 2, Good Data Science discussed the importance of understanding the relationship between data and the reality it describes. This relationship is the essence of domain knowledge. Professionals in their respective domains are trained and experienced with measuring the reality they manage. Chapter 2, Good Data Science described some of the differences in measurement between the two domains.
The literature often distinguishes between raw data and processed data as the main ingredients of analysis. The idea that data can be raw and natural is deceiving. There is no such thing as raw data, because every time we collect information from a physical or social process, we need to decide how the data is collected. These decisions are always informed by assumptions about what this reality looks like before we see the data. We cannot measure anything...