Different Hadoop data encryption options
Let us have a look at the various options available.
Dataguise for Hadoop
Dataguise (DG) for Hadoop provides a symmetric-key-based encryption of the data. One of the key features of Dataguise is to identify and encrypt sensitive data. It supports encryption and masking techniques for sensitive data protection. It enables encryption of data with Hadoop API, Sqoop, and Flume. Thus, it can be used to encrypt data moving in and out of the Hadoop ecosystem. Administrators can schedule the data scan within the Hadoop ecosystem at regular intervals, and detect sensitive data and encrypt or mask it. More details on Dataguise are available at http://dataguise.com/products/dghadoop.html.
Gazzang zNcrypt
Gazzang zNcrypt provides a transparent block level encryption and provides the ability to manage the keys used for encryption. zNcrypt acts like a virtual filesystem that intercepts any application layer request to access the files. It encrypts the block as it is written to the disk. zNcrypt leverages the Intel AES-NI hardware encryption acceleration for maximum performance in the cryptographic process. It also provides role-based access control and policy-based management of the encryption keys. This can be used to implement multiple classification level security in a secured Hadoop cluster.
eCryptfs for Hadoop
eCryptfs is a cryptographic stacked Linux filesystem. eCryptfs stores cryptographic metadata in the header of each file written. When the encrypted files are copied between hosts, the file will be decrypted with the proper key in the Linux kernel key ring. We can set up a secured Hadoop cluster with eCryptfs on each node. This ensures that data is transparently shared between nodes, and that all the data is encrypted before being written to the disk.
More information on eCryptfs is available in the following link: https://launchpad.net/ecryptfs.