Chapter 4: Data Cataloging, Security, and Governance
There is probably no more important topic to cover in a book that deals with data than data security and governance (and the related topic of data cataloging). Having the most efficient data pipelines, the fastest data transformations, and the best data consumption tools is not worth much if the data is not kept secure. Also, data storage must comply with local laws for how the data should be handled, and the data needs to be cataloged so that it is discoverable and useful to the organization.
Sadly, it is not uncommon to read about data breaches and poor data handling by organizations, and the consequences of this can include reputational damage to the organization, as well as potentially massive penalties imposed by the government.
In this chapter, we will do a deeper dive into the important considerations around best practices for handling data responsibly. We will cover the following topics:
- Getting data security...