Securing data content
In the context of a data lake, security is a “job zero” priority. In Chapter 8, Data Security, we will dive deep into security. In this section, we cover basic ETL operations that secure data. The following common techniques can be used to hide confidential values from data:
- Masking values
- Hashing values
In this section, you will learn how to mask/hash values that are included in your data.
Masking values
In business data lakes, the data can contain sensitive data, such as people’s names, phone numbers, credit card numbers, and so on. Data security is an important aspect of data lakes. There are different approaches to handling such data securely. It is a good idea to just drop the sensitive data when you collect the data from data sources when you won’t use the sensitive data in analytics. It is also common to manage access permissions on certain columns or records of the data. Another approach is to mask the...