Characteristics of a data lake
Another essential thing to analyze when setting up data lakes is the characteristics of the data lake. As we will see in a later section, these characteristics can be measured and help us gauge the success or failure of a data lake:
- Size: This is the "volume" in the often-mentioned three Vs. of big data (volume, variety, velocity) – how big is the lake?
- Governability: How easy is verifying and certifying your lake's data?
- Quality: What is the quality of the data contained in the lake? Are some records and files invalid? Are there duplicates? Can you determine the source and lineage of the data in the lake?
- Usage: How many visitors, sources, and downstream systems do the lake have? How easy is it to populate and access the data in the lake?
- Variety: Does the data that the lake holds have many types? Are there many types of data sources that feed the lake? Can the data in the lake be extracted in different ways and formats, such as files...