Chapter 1. Introduction to Data Analysis
Data analysis is the process of organizing, cleaning, transforming, and modeling data to obtain useful information and ultimately, new knowledge. The terms data analytics, business analytics, data mining, artificial intelligence, machine learning, knowledge discovery, and big data are also used to describe similar processes. The distinctions of these fields probably lie more in their areas of application than in their fundamental nature. Some argue that these are all part of the new discipline of data science.
The central process of gaining useful information from organized data is managed by the application of computer science algorithms. Consequently, these will be a central focus of this book.
Data analysis is both an old field and a new one. Its origins lie among the mathematical fields of numerical methods and statistical analysis, which reach back into the eighteenth century. But many of the methods that we shall study gained prominence much more recently, with the ubiquitous force of the internet and the consequent availability of massive datasets.
In this first chapter, we look at a few famous historical examples of data analysis. These can help us appreciate the importance of the science and its promise for the future.