Using stack and melt to reshape data from wide to long format
One type of untidiness that Wickham identified is variable values embedded in column names. Although this rarely happens with enterprise or relational data, it is fairly common with analytical or survey data. Variable names might have suffixes that indicate a time period, such as a month or year. Another case is that similar variables on a survey might have similar names, such as familymember1age
, familymember2age
, and so on, because that is convenient and consistent with the survey designers' understanding of the variable.
One reason why this messiness happens relatively frequently with survey data is that there can be multiple units of analysis on one survey instrument. An example is the United States decennial census, which asks both household and person questions. Survey data is also sometimes made up of repeated measures or panel data, but nonetheless often has only one row per respondent. When this is the case...