Summary
In this chapter, we reviewed important concepts around data security and governance, including how a data catalog can be used to help prevent your data lake from becoming a data swamp.
Data encryption at rest and in transit, and tokenization of PII data, are important concepts for a data engineer to understand to protect data in the data lake, and a service such as AWS Lake Formation is a useful tool for easily managing authorization for datasets.
In the next chapter, we will take a step back and look at the bigger picture of how a data engineer can architect a data pipeline. We will begin exploring how to understand the needs of our data consumers, learn more about our data sources, and decide on the transformations that are required to transform raw data into useful data for analytics.