From the documentation + code, "when kerberos is enabled, all tasks are run as the end user (e..g as user "joe" and not as hadoop user "mapred") using the task-controller (which is setuid root and when it runs, it does a setuid/setgid etc. to Joe and his groups ). For this to work, user "joe" linux account has to be present on all nodes of the cluster."
In a environment with large and dynamic user population; it is not practical to add every end user to every node of the cluster (and drop user when end user is deactivated etc.)
What are other options get this working ? I am assuming that if the users are in a LDAP, can using the PAM for LDAP solve the issue. Any other suggestions?