Summary
In this chapter, we have explored the topic of data security and explained some of the surrounding issues. We have discovered that not only is there technical knowledge to master, but also that a data security mindset is just as important. Data security is often overlooked and, therefore, taking a systematic approach, and educating others, is a key responsibility for mastering data science.
We have explained the data security life cycle and outlined the most important areas of responsibility, including authorization, authentication and access, along with related examples and use cases. We have also explored the Hadoop security ecosystem and described the important open source solutions currently available.
A significant part of this chapter was dedicated to building a Hadoop InputFormat
compressor that operates as a data encryption utility that can be used with Spark. Appropriate configuration allows the codec to be used in a variety of key areas, crucially when spilling shuffled records...