Fixing many-to-many relationships
We sometimes have to work with a data table that was created from a many-to-many merge. This is a merge where merge-by column values are duplicated on both the left and right sides. As we discussed in the previous chapter, many-to-many relationships in a data file often represent multiple one-to-many relationships where the one side has been removed. There is a one-to-many relationship between dataset A and dataset B, and also a one-to-many relationship between dataset A and dataset C. The problem we sometimes have is that we receive a data file with B and C merged but with A excluded.
The best way to work with data structured in this way is to recreate the implied one-to-many relationships, if possible. We do this by first creating a dataset structured like A; that is, how A is likely structured given the many-to-many relationship we see between B and C. The key to being able to do this is to identify a good merge-by column for the data on both...