Summary
In this chapter, we touched upon several intermediate data processing techniques, ranging from structured tabular data to unstructured textual data. First, we covered how to transform categorical and numeric variables, including recoding categorical variables using recode()
, creating new variables using case_when()
, and binning numeric variables using cut()
. Next, we looked at reshaping a DataFrame, including converting a long-format DataFrame into a wide format using spread()
and back again using gather()
. We also delved into working with strings, including how to create, convert, and format string data.
In addition, we covered some essential knowledge regarding the stringr
package, which provides many helpful utility functions to ease string processing tasks. Common functions include str_c()
, str_sub()
, str_subset()
, str_detect()
, str_split()
, str_count()
, and str_replace()
. These functions can be combined to create a powerful and easy-to-understand string processing pipeline...