Intermediate Data Processing
The previous chapter covered a suite of commonly used functions offered by dplyr
for data processing. For example, when characterizing and extracting the statistics of a dataset, we can follow the split-apply-combine procedure using group_by()
and summarize()
. This chapter continues from the previous one and focuses on intermediate data processing techniques, including transforming categorical and numeric variables and reshaping DataFrames. Besides that, we will also introduce string manipulation techniques for working with textual data, whose format is fundamentally different from the neatly shaped tables we have been working with so far.
By the end of this chapter, you will be able to perform more advanced data manipulation and extend your data massaging skills to string-based texts, which are fundamental to the field of natural language processing.
In this chapter, we will cover the following topics:
- Transforming categorical and numeric...