Data transformations in data lake tiers
As we saw in Chapter 3, we dump raw data into the bronze layer. It serves as the primary source of raw, unrefined data for the warehouse. This layer contains all the original data as it is received from various sources, including transactional systems, log files, and external data feeds. The purpose of the bronze layer is to provide a centralized location for raw data to be stored, and to make it available for further processing in the higher layers of the warehouse. Data is transformed into the silver and gold layers.
Bronze-to-silver transformations
When moving from the bronze layer to the silver layer, a series of transformations are applied to make the data more usable for analysis. Some examples of transformations that are typically done in the silver layer include the following:
- Data cleansing: Removing any duplicates and correcting errors and inconsistencies in the data.
- Data integration: Combining data from multiple...