Mapping of security technologies with the reference architecture
We looked at the various commercial and open source tools that enable securing the Big Data platform. This section provides the mapping of these various technologies and how they fit into the overall reference architecture.
Infrastructure security
Physical security needs to be enforced manually. However, unauthorized access to a distributed cluster is avoided by deploying Kerberos security in the cluster. Kerberos ensures that the services and users confirm their identity with the KDC before they are provided access to the infrastructure services. Project Rhino aims to extend this further by providing the token-based authentication framework.
OS and filesystem security
Filesystem security is enforced by providing a secured virtualization layer on the existing OS filesystem using the file encryption technique. Files written to the disk are encrypted and while files read from the file are decrypted on-the-fly. These features are provided by eCryptfs and zNcrypt tools. SELinux also provides significant protection by hardening the OS.
Application security
Tools such as Sentry and HUE provide a platform for secured access to Hadoop. They integrate with LDAP to provide seamless enterprise integration.
Network perimeter security
One of the common techniques to ensure perimeter security in Hadoop is by isolation of the Hadoop cluster from the rest of the enterprise. However, users still need to access the cluster with tools such as Knox and HttpFS , that provide the proxy layer for end users to remotely connect to the Hadoop cluster and submit jobs and access the filesystem.
Data masking and encryption
To protect data in motion and at rest, encryption and masking techniques are deployed. Tools such as IBM Optim and Dataguise provide large scale data masking for enterprise data. To protect data in REST in Hadoop, we deploy block-level encryption in Hadoop. Intel's distribution supports the encryption and compression of files. Project Rhino enables block-level encryption similar to Dataguise and Gazzang.
Authentication and authorization
While authentication and authorization has matured significantly, tools such as Zettaset Orchestrator and Project Rhino enable integration with the enterprise system for authentication and authorization.
Audit logging, security policies, and procedures
Common Security Audit logging for user access to Hadoop Cluster is enabled by tools such as Cloudera Manager. Cloudera Manager also has the ability to generate alerts and events based on the configured organizational policies. Similarly, Intel's manager and Zettaset Orchestrator also provide the security policies enforcement in the cluster as per organizational policies.
Security Incident and Event Monitoring
Detecting security incident and monitoring events in a Big Data platform is essential. Open source tools such as OSSEC and IBM Gaudium enable a secured Hadoop cluster to detect security incidents and provide easy integration with enterprise SIEM tools.