Discovering additional knowledge
The following is some advice that you might find useful.
Do:
- Plan for security from day one: Where are your trade-offs between security and usability?
- Enforce as much discipline as needed, but not more than is really necessary. Your data lake needs to serve your Data Scientists, as well as other communities in your company. Your modern data warehouse needs some agility.
- Structure your zones clearly and stick to the plan. If you need to redesign, don't do so in your already started structure.
- Implement a Data Catalog (we will talk about this in Chapter 14, Establishing Data Governance) to enable easy data discovery.
- Integrate with DevOps for a controlled and repeatable system.
Don't:
- Don't mix different formats. Always stick to one single file format per folder. You will often want to read all the files in a folder in one go.
- Don't forget naming conventions!