Key security considerations
As discussed previously, to meet the enterprise data security needs for a Big Data ecosystem, a complex and holistic approach is needed to secure the entire ecosystem. Some of the key security considerations while securing Hadoop-based Big Data ecosystem are:
Authentication: There is a need to provide a single point for authentication that is aligned and integrated with existing enterprise identity and access management system.
Authorization: We need to enforce a role-based authorization with fine-grained access control for providing access to sensitive data.
Access control: There is a need to control who can do what on a dataset, and who can use how much of the processing capacity available in the cluster.
Data masking and encryption: We need to deploy proper encryption and masking techniques on data to ensure secure access to sensitive data for authorized personnel.
Network perimeter security: We need to deploy perimeter security for the overall Hadoop ecosystem that controls how the data can move in and move out of the ecosystem to other infrastructures. Design and implement the network topology to provide proper isolation of the Big Data ecosystem from the rest of the enterprise. Provide proper network-level security by configuring the appropriate firewall rules to prevent unauthorized traffic.
System security: There is a need to provide system-level security by hardening the OS and the applications that are installed as part of the ecosystem. Address all the known vulnerability of OS and applications.
Infrastructure security: We need to enforce strict infrastructure and physical access security in the data center.
Audits and event monitoring: A proper audit trial is required for any changes to the data ecosystem and provide audit reports for various activities (data access and data processing) that occur within the ecosystem.
Reference architecture for Big Data security
Implementing all the preceding security considerations for the enterprise data security becomes very vital to building a trusted Big Data ecosystem within the enterprise. The following figure shows as a typical Big Data ecosystem and how various ecosystem components and stakeholders interact with each other. Implementing the security controls in each of these interactions requires elaborate planning and careful execution.
The reference architecture depicted in the following diagram summarizes the key security pillars that needs to be considered for securing a Big Data ecosystem. In the next chapters, we will explore how to leverage the Hadoop security model and the various existing enterprise tools to secure the Big Data ecosystem.
In Chapter 4, Securing the Hadoop Ecosystem, we will look at the implementation details to secure the OS and applications that are deployed along with Hadoop in the ecosystem. In Chapter 5, Integrating Hadoop with Enterprise Security Systems, we look at the corporate network perimeter security requirement and how to secure the cluster and look at how authorization defined within the enterprise identity management system can be integrated with the Hadoop ecosystem. In Chapter 6, Securing Sensitive Data in Hadoop, we look at the encryption implementation for securing sensitive data in Hadoop. In Chapter 7, Security Event and Audit Logging in Hadoop, we look at security incidents and event monitoring along with the security policies required to address the audit and reporting requirements.