top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Is it possible to turn on data node encryption without kerberos?

0 votes
713 views

I've been trying to secure block data transferred by HDFS. I added below to hdfs-site.xml and core-site xml to the data node and name node and restart both.

 dfs.encrypt.data.transfer
 true

 hadoop.rpc.protection
 privacy

When I try to put a file from the hdfs command line shell, the operation fails with "connection is reset" and I see following from the datanode log:

"org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to read expected encryption handshake from client a /172.31.36.56:48271. Perhaps the client is running an older version of Hadoop which does not support encryption"

I am able to reproduce this on two different deployments. I was following https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Authentication , but didn't turn on kerberos authentication. No authentication works in my environment. Can this be the reason the handshake fails?

posted Apr 6, 2016 by anonymous

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button
Kerberos is used to authenticate user or service principal to grant access to cluster. It doesn't encrypt data blocks coming in and out of cluster.

1 Answer

0 votes

It is possible to turn on data transfer protocol encryption without enabling Kerberos authentication. We have a test suite in the Hadoop codebase named TestEncryptedTransfer that configures data transfer encryption, but not Kerberos, and those tests are passing.

The hadoop.rpc.protection setting is unrelated to data transfer protocol. Instead, it controls the SASL quality of protection for the RPC connections used by many Hadoop client/server interactions. This won't really be active unless Kerberos authentication is enabled though.

Please note that even though it's possible to enable data transfer protocol encryption without using Kerberos authentication in the cluster, the benefit of that is questionable in a production deployment. Without Kerberos authentication, it's very easy for an unauthenticated user to spoof another user and access their HDFS files. Whether or not the data is encrypted in transit becomes irrelevant at that point.

answer Apr 6, 2016 by Jai Prakash
Similar Questions
+3 votes

From the documentation + code, "when kerberos is enabled, all tasks are run as the end user (e..g as user "joe" and not as hadoop user "mapred") using the task-controller (which is setuid root and when it runs, it does a setuid/setgid etc. to Joe and his groups ). For this to work, user "joe" linux account has to be present on all nodes of the cluster."

In a environment with large and dynamic user population; it is not practical to add every end user to every node of the cluster (and drop user when end user is deactivated etc.)

What are other options get this working ? I am assuming that if the users are in a LDAP, can using the PAM for LDAP solve the issue. Any other suggestions?

+1 vote

On a kerberos based Hadoop cluster, a kinit is done and then oozie command is executed. This works every time (thus no setup issues), except once it failed with following error.

Error: AUTHENTICATION : Could not authenticate, GSSException: No valid credentials provided (Mechanism level: Generic error (description in e-text) (60) - PROCESS_TGS).

Any thoughts on what could cause the transient failure? Would any updates on node (e.g. Java etc.) cause such issue? Cluster is working fine with kerberos.

+1 vote

Recently I have set up Kerberos security for a Hadoop cluster and added a few data nodes to it. While running hdfs balancer, I found that Kerberos ticket is expired and balancer stop.

The Kerberos ticket has 1day lifetime with 7days max renewable lifetime. Are there any options to automatically renew the ticket while running balancer?

Or should I re-start it everyday?

+2 votes

Does anyone know if the NFS HDFS gateway is currently supported on secure clusters using Kerberos for Hadoop 2.2.0? We are using HDP 2.0 and looking to use NFS gateway

+2 votes

When I try "mvn -T2 package -Pdist,native-win -DskipTests -Dtar", it fails with message:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.3:single (package-mapreduce) on project hadoop-mapreduce:
Failed to create assembly: Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.1 (included by module) does not have an artifact with a file.
Please ensure the package phase is run before the assembly is generated.

Please help?

...