The Purpose of Data Science
In summary, the promises of data science within organizations have gained a lot of popularity over the past six years. The downside of this popularity is that self-proclaimed futurists have exaggerated the benefits of a strategic and systematic approach to analyzing data. To obtain value from this new approach to using data requires a pragmatic approach beyond the hype. For most organizations, data science will look very differently from the digital utopia portrayed in popular publications.
This chapter defines data science as the strategic and systematic use of data to create value for organizations or society overall. The purpose of using data to improve how organizations perform is to reduce bias in decisions. The original objections that Frederick Taylor held against rules of thumb more than a century ago still stands. Computational analysis of data is a valuable tool to achieve this reduced bias in deciding about future courses of action.
Data science is an interdisciplinary activity that combines domain knowledge with competencies in mathematics and computer science. The data revolution of the past decades has caused an exponential increase in available data, computing capabilities and open source software. Data science is paradoxically not a science about data but a scientific way to use data to influence reality positively. Expertise about the reality under consideration, or domain knowledge, drives data science. Mathematics and computer science are the tools that enable a deeper understanding of our reality and help us to optimize our decisions.
Now that we have an idea of what data science is and what it consists of, we need to define what good data science looks like. The following chapter expands on this description of data science by presenting a normative model of data science. This model defines best practice as the useful, sound and aesthetic analysis of data.