Loading a DataFrame with sensitive information
Earlier in this chapter, we learned about techniques such as data masking, and row- and column-level security for Azure Synapse SQL. Spark, at the time of writing this book, didn't have such techniques to handle sensitive information. In this section, we will look at an example of how to best emulate handling sensitive information such as Personally Identifiable Information (PII) using encryption and decryption:
- Let's create a simple table that contains PII information such as social security numbers (SSNs) using PySpark:
from pyspark.sql.types import StructType,StructField, StringType, IntegerType cols = StructType([ \ StructField("Name",StringType(),True), \ StructField("SSN",StringType(),True), \ StructField("email",StringType(),True) ]) data = [("Adam Smith","111-11-1111","james...