Data governance
Data democratization and self-service capabilities are some of the advantages of data lakes. A data governance layer is imperative to put the right guardrails in place while allowing stakeholders to get the most business value from the generated and curated data and insights. A good data catalog is essential for producing actionable insights in any data-driven organization. Cloud vendors have their own offerings, such as AWS Glue, Azure Purview, and Azure Data Catalog. Apache Atlas is probably the most popular open source offering, and there are vendors who specialize in this area such as Alation and Collibra.
The three primary goals of governance are the following:
- Keeping data secure and only the right privileges and roles dictate access to data
- Ensuring the quality of the stored data is high so that it is meaningful to its consumers, who then develop trust in their data and hence the insights generated on top of the data
- Discovering data so that...