Mapping additional information
From the data model that we prepared earlier, we know that there are three key files that we have to map: Household Information, Weather, and Bank Holidays.
The informations_households.csv
file contains metadata about the household. There are static features that are not dependent on time. For this, we just need to left merge informations_households.csv
to the compact form based on LCLid
, which is the time series identifier.
Best practice
While doing a pandas merge, one of the most common and unexpected outcomes is that the number of rows before and after the operation is not the same (even if you are doing a left merge). This typically happens because there are duplicates in the keys on which you are merging. As a best practice, you can use the validate
parameter in the pandas merge, which takes in inputs such as one_to_one
and many_to_one
so that this check is done while merging and will throw an error if the assumption is not met. For more...