It is generally easier to maintain data when each table contains information from a single observational unit. On the other hand, it can be easier to find insights when all data is in a single table, and in the case of machine learning, all data must be in a single table. The focus of tidy data is not on directly performing analysis. Rather, it is structuring the data so that analysis is easier further down the line, and when there are multiple observational units in one table, they may need to get separated into their own tables.
Tidying when multiple observational units are stored in the same table
Getting ready
In this recipe, we use the movie dataset to identify the three observational units (movies, actors, and directors...