Storing data in the data lake layer
Once the data is ingested into the data lake layer, it needs to be managed and stored correctly. A resilient storage strategy reduces the unnecessary duplication of data. In addition, it ensures that need-based access is provided for the stakeholders and that proper security controls are applied to ensure data security. So, let's first investigate the various datastores of a data lake.
Data lake layer
Data in the data lake layer is segregated into multiple datastores. Each datastore has its own purpose and guidelines for use. As depicted in the following figure, there are four types of datastores in the data lake layer:
The data in the data lake is stored in a hierarchical file structure. A hierarchical file structure creates a folder that behaves more like a traditional operating system's filesystem in terms of moving and renaming files. In addition...