― Sir Arthur Conan Doyle, The Adventures of Sherlock Holmes
Data, like science, has been ubiquitous the world over since early history. The term data science is not generally taken to literally mean science with data, since without data there would be of science. Rather, it is a specialized field in which data scientists and other practitioners apply advanced computing techniques, usually along with algorithms or predictive analytics to uncover insights that may be challenging to obtain with traditional methods.
Data science as a distinct subject was proposed since the early 1960s by pioneers and thought leaders such as Peter Naur, Prof. Jeff Wu, and William Cleveland. Today, we have largely realized the vision that Prof. Wu and others had in mind when the concept first arose; data science as an amalgamation of computing, data mining, and predictive analytics, all leading up to deriving key insights that drive business and growth across the world today.
The driving force behind this has been the rapid but proportional growth of computing capabilities and algorithms. Computing languages have also played a key role in supporting the emergence of data science, primary among them being the statistical language R.
In this introductory chapter, we will cover the following topics:
- Introduction to data science and R
- Active domains of data science
- Solving problems with data science
- Using R for data science
- Setting up R and RStudio
- Our first R program