Components of a data lake
Data lake is one of those terms that means different things to different people. We mentioned earlier that a data lake comprises the following:
- Unstructured and structured data
- Raw data and transformed data
- Different data types and data sources
For that reason, there is no perfect way to create a data lake. In general, though, many data lakes are implemented using the following logical components.
Landing or transient data zone
This is a buffer used to temporarily host data as you prepare to permanently move it to the landing data zone defined later.
It contains temporary data, such as a streaming spool, an interim copy, and other non-permanent data, before being ingested.
Raw data zone
After quality checks and security transformations have been performed in the transient data zone, the data can be loaded into the raw data zone for permanent storage:
- In the raw zone, files are transferred in their native format...