Understanding data catalog and access management options
When you think of data lake use cases, where the storage layer is a filesystem such as HDFS or an object store such as Amazon S3, by default, the data is not represented as databases or tables. In a data lake, you may receive datasets as structured, semi-structured, or unstructured datasets or files.
If it is unstructured data, such as media files (images, videos), then often machine learning or artificial intelligence tools are integrated to extract data and metadata about the media files and save the output to a data lake for further analytics.
If it is semi-structured, then often it goes through Extract, Transform, and Load (ETL) transformations to flatten it so that it is available to data analysts or data scientists for consumption.
Structured data, which is available as files or objects in a data lake, is not accessible to business users or data analysts in a form that they can query data using standard SQL...