Discussing data preparation
Generally, the first step for any data science project is to explore and prepare the data. We will refer to this process as moving the data from “Bronze” to “Silver” layers in reference to the Medallion architecture methodology. You might think of this type of data transformation exclusively as a data engineering task, but it’s also essential for data science and machine learning.
If you aren’t familiar with this architecture terminology, the Medallion architecture is a data design pattern used to organize data logically in a warehouse. This architecture is also commonly called “multi-hop” architecture. It aims to incrementally and progressively improve the structure and quality of data as it flows through each layer. The Medallion architecture has three layers: Bronze, Silver, and Gold, listed as follows:
- The Bronze layer is the raw data layer. It contains all the raw, unprocessed data ingested...