Modeling a data lake
A data lake is, in essence, nothing more than a limitless hard drive that we use to store files. Everyone knows that if you do not carefully consider a folder structure to use on your personal computer, you will end up with a mess and it will become almost impossible to find your files. It is no different for your data lake. Although we cannot speak of data modeling when designing a data lake, creating a structure is really important for a successful implementation. That is even more true when the data in the data lake is made available to end users, such as data analysts working with Power BI or data scientists working with Python notebooks.
When creating a folder structure, you need to take the following things into account:
- The source of the data
- The functional meaning of the data
- Security: Who may or may not need to have access to the data?
- Who and which applications will use the data?
- Is the data stored permanently or temporarily...