Hadoop and Kerberos
As you saw in the previous sections, Hadoop provides all the components to restrict access to various resources and services. There is still one piece of the puzzle missing, though. Since Hadoop doesn't maintain any internal user database, it has to completely trust users' identities as provided by the operating system. While Linux-based operating systems authenticate users with passwords or public/private key pairs, once a user is logged in, there is no way for Hadoop to correctly verify his/her identity. In the early versions of Hadoop, HDFS and MapReduce clients were executing an equivalent of the whoami
shell command to get the identity of the user.
This was a very unsecure way of doing things, because it allowed a rogue user to just substitute the whoami
command with a custom script that would return any username it liked.
In the latest version of Hadoop, code that retrieves the user identity was changed to use Java SecurityManager API, but the approach is still open...