Building and sizing a data platform
As with every service that we deploy in cloud, we need something to build a platform on a foundation. Hence, building a landing zone that can hold raw data is the first step. This landing zone should be an environment that serves only one purpose: to capture raw data. It’s recommended to build this landing zone separate from core IT systems. It should be scalable, but at low-cost, since it will hold a lot of data. The issue with keeping data is that it might increase the cloud bill exponentially. Data storage comes at a very low price per unit of data, but the catch is that we need a lot of these small units.
Important is to implement governance from the start. This includes defining and implementing guardrails for classification of data and tagging.
Once the landing zone has been established, data analysts can start using the data lake as a sandbox environment. This is the second stage. Analysts can start building prototypes of data models and...