Revisiting the Medallion architecture pattern
We introduced the Medallion architecture in Chapter 2. As a reminder, this refers to the data design pattern used to organize data logically. It has three layers – Bronze, Silver, and Gold. There are also cases where additional levels of refinement are required, so your Medallion architecture could be extended to Diamond and Platinum levels if needed. The Bronze layer contains raw data, the Silver layer contains cleaned and transformed data, and the Gold layer contains aggregated and curated data. Curated data refers to the datasets selected, cleaned, and organized for a specific business or modeling purpose. This architecture is a good fit for data science projects. Maintaining the original data as a source of truth is important, while curated data is valuable for research, analytics, and machine learning applications. By selecting, cleaning, and organizing data for a specific purpose, curated data can help improve its accuracy...