top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Ways to manage user accounts on hadoop cluster when using kerberos security

+3 votes
688 views

From the documentation + code, "when kerberos is enabled, all tasks are run as the end user (e..g as user "joe" and not as hadoop user "mapred") using the task-controller (which is setuid root and when it runs, it does a setuid/setgid etc. to Joe and his groups ). For this to work, user "joe" linux account has to be present on all nodes of the cluster."

In a environment with large and dynamic user population; it is not practical to add every end user to every node of the cluster (and drop user when end user is deactivated etc.)

What are other options get this working ? I am assuming that if the users are in a LDAP, can using the PAM for LDAP solve the issue. Any other suggestions?

posted Jan 7, 2014 by Ahmed Patel

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+1 vote

LDAP/AD is pretty much it. You can also have Kerberos authenticate directly to AD, or set up one-way trust between AD and MIT Kerberos. There are other identity management systems that basically implement the same. At the end of the day, you need to have (1) users in KDC (2) users on the nodes, and (3) user-group mapping. And it makes sense for all three to come from the same system.

answer Jan 8, 2014 by Tarun Singhal
Similar Questions
+1 vote

On a kerberos based Hadoop cluster, a kinit is done and then oozie command is executed. This works every time (thus no setup issues), except once it failed with following error.

Error: AUTHENTICATION : Could not authenticate, GSSException: No valid credentials provided (Mechanism level: Generic error (description in e-text) (60) - PROCESS_TGS).

Any thoughts on what could cause the transient failure? Would any updates on node (e.g. Java etc.) cause such issue? Cluster is working fine with kerberos.

0 votes

I've been trying to secure block data transferred by HDFS. I added below to hdfs-site.xml and core-site xml to the data node and name node and restart both.

 dfs.encrypt.data.transfer
 true

 hadoop.rpc.protection
 privacy

When I try to put a file from the hdfs command line shell, the operation fails with "connection is reset" and I see following from the datanode log:

"org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to read expected encryption handshake from client a /172.31.36.56:48271. Perhaps the client is running an older version of Hadoop which does not support encryption"

I am able to reproduce this on two different deployments. I was following https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Authentication , but didn't turn on kerberos authentication. No authentication works in my environment. Can this be the reason the handshake fails?

+1 vote

Recently I have set up Kerberos security for a Hadoop cluster and added a few data nodes to it. While running hdfs balancer, I found that Kerberos ticket is expired and balancer stop.

The Kerberos ticket has 1day lifetime with 7days max renewable lifetime. Are there any options to automatically renew the ticket while running balancer?

Or should I re-start it everyday?

+2 votes

Did any one got these error before, please help

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: xxxxx.com:50010:DataXceiver error processing WRITE_BLOCK operation  src: /xxxxxxxx:39000 dst: /xxxxxx:50010

java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:167)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
at java.lang.Thread.run(Thread.java:745)
2015-01-11 04:13:21,846 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 657ms (threshold=300ms)
+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

...