Characteristics of a data lake
Another important thing to analyze when setting up data lakes is the characteristics of the data lake. As we will see in a later section, these characteristics can be measured and help us gauge the success or failure of a data lake:
- Size: This is the "volume" in the often-mentioned three Vs of big data (volume, variety, velocity) – how big is the lake?
- Governability: How easy is it to verify and certify the data in your lake?
- Quality: What is the quality of the data contained in the lake? Are some records and files invalid? Are there duplicates? Can you determine the source and lineage of the data in the lake?
- Usage: How many visitors, sources, and downstream systems does the lake have? How easy is it to populate and access the data in the lake?
- Variety: Does the data that the lake holds have many types? Are there many types of data sources that feed the lake? Can the data in the lake be extracted in different...