Hadoop security
When it was first created, Hadoop was not designed to work as the repository of an enterprise's entire store of data, as the data lake concept proposes. It was assumed that Hadoop will be operated in the trusted environment by the trusted users. Moreover, the early versions of Hadoop were used to store the data from public web logs, the confidentiality of which was not an issue. As Hadoop started getting positioned as the platform for enterprises, the Hadoop security concerns came to the forefront. To address these concerns, open source and proprietary solutions came on the market. These solutions focused upon a single security aspect such as data encryption and perimeter security, however they did not offer the fine-grained authorization on the data stored in Hadoop. A detailed discussion on Hadoop security is available at http://www.infoq.com/articles/HadoopSecurityModel.
HDFS permissions model
In our data lake use case, HDFS is the storage system for raw data. HDFS organizes...