In Chapter 2, Data Visualization and Graphics, it was mentioned that data visualization is a key part of EDA. The techniques for data management we'll discuss in this chapter constitute the other important parts of EDA, which you should always do prior to modeling and analysis. In this chapter, we will address what a factor variable is and how to use one, how to summarize your data numerically, how to combine, merge, and split datasets, and how to split and combine strings.
By the end of this chapter, you will be able to:
- Create and reorder factor variables
- Generate pivot tables
- Aggregate data using the base and dplyr packages
- Use various methods to split, apply, and combine data in R
- Split character strings using the stringr package
- Merge and join different datasets using base R and the dplyr methods