Data Processing with dplyr
In the previous chapter, we covered the basics of the R language itself. Grasping these fundamentals will help us better tackle the challenges in the most common task in data science projects: data processing. Data processing refers to a series of data wrangling and massaging steps that transform the data into its intended format for downstream analysis and modeling. We can consider it as a function that accepts the raw data and outputs the desired data. However, we need to explicitly specify how the function executes the cooking recipe and processes the data.
By the end of this chapter, you will be able to perform common data wrangling steps such as filtering, selection, grouping, and aggregation using dplyr
, one of the most widely used data processing libraries in R.
In this chapter, we will cover the following topics:
- Introducing
tidyverse
anddplyr
- Data transformation with
dplyr
- Data aggregation with
dplyr
- Data merging with
dplyr...