Components of a data lake
A data lake is one of those terms that means different things to different people. We mentioned earlier that a data lake comprises the following:
- Unstructured and structured data
- Raw data and transformed data
- Different data types and data sources
Therefore, there is no perfect way to create a data lake. Building clean and secure data lakes can take months. There are a lot of steps involved in building data lakes.
- Set up storage – Data lakes hold a massive amount of data. Before doing anything else, you must set up storage to hold all that data. In AWS, you can configure S3 buckets and partitions. If you are doing this on-prem, you must acquire hardware and set up large disk arrays to hold the data for the data lake.
- Move data -- You must connect to different data sources on-premises, in the cloud, and IoT devices. Then you need to collect and organize the relevant data sets from those sources, crawl the data to extract the schemas, and add metadata...