The first project we will introduce in this book is an analysis of automobile fuel economy data. The primary tool that we will use to analyze this dataset is the R statistical programming language. R is often referred to as the lingua franca of data science since it is currently the most popular language for statistics and data analysis. As you'll see from the examples in this book, R is an excellent tool for data manipulation, analysis, modeling, visualization, and creating useful scripts to get analytical tasks done.
The recipes in this chapter will roughly follow these five steps in the data science pipeline:
- Acquisition
- Exploration and understanding
- Munging, wrangling, and manipulation
- Analysis and modeling
- Communication and operationalization
Process-wise, the backbone of data science is the data science pipeline, and in order to get good at data science...