top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Where can I find information on Hadoop Security Best Practices?

0 votes
430 views

Can someone share the information of hadoop best practices or the link where can I find these?

posted Jul 15, 2014 by Kiran Kumar

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button
If you are serious , the following 3 papers can help you out.
1. Hadoop security Design - Oct 2009 yahoo-inc.com
2. New Hadoop security Design  by A. Becherer
3. Securing your enterprise Hadoop Ecosystem - white paper, Cloudera
lots of reading though

Similar Questions
+3 votes

From the documentation + code, "when kerberos is enabled, all tasks are run as the end user (e..g as user "joe" and not as hadoop user "mapred") using the task-controller (which is setuid root and when it runs, it does a setuid/setgid etc. to Joe and his groups ). For this to work, user "joe" linux account has to be present on all nodes of the cluster."

In a environment with large and dynamic user population; it is not practical to add every end user to every node of the cluster (and drop user when end user is deactivated etc.)

What are other options get this working ? I am assuming that if the users are in a LDAP, can using the PAM for LDAP solve the issue. Any other suggestions?

+2 votes

Let we change the default block size to 32 MB and replication factor to 1. Let Hadoop cluster consists of 4 DNs. Let input data size is 192 MB. Now I want to place data on DNs as following. DN1 and DN2 contain 2 blocks (32+32 = 64 MB) each and DN3 and DN4 contain 1 block (32 MB) each. Can it be possible? How to accomplish it?

+2 votes

Did any one got these error before, please help

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: xxxxx.com:50010:DataXceiver error processing WRITE_BLOCK operation  src: /xxxxxxxx:39000 dst: /xxxxxx:50010

java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:167)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
at java.lang.Thread.run(Thread.java:745)
2015-01-11 04:13:21,846 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 657ms (threshold=300ms)
+1 vote

I'm trying to implement security on my hadoop data. I'm using Cloudera hadoop and looking for the following.

  1. ROLE BASED AUTHORIZATION AND AUTHENTICATION

  2. ENCRYPTION ON DATA RESIDING IN HDFS

I have looked into Kerboroes but it doesn't provide encryption for data already residing in HDFS. Are there any other security tools i can go for? has anyone done above two security features in cloudera hadoop.

+1 vote

We are trying to measure performance between HTTP and HTTPS version on Hadoop DFS, Mapreduce and other related modules.

As of now, we have tested using several metrics on Hadoop HTTP Mode. Similarly we are trying to test the same metrics on HTTPS Platform. Basically our test suite cluster consists of one Master Node and two Slave Nodes.

We have configured HTTPS connection and now we need to verify whether Nodes are communicating directly through HTTPS. Tried checking logs, clusters webhdfs ui, health check information, dfs admin report but of no help. Since there is only limited documentation available in HTTPS, we are unable to verify whether Nodes are communicating through HTTPS.

Hence any experts around here can shed some light on how to confirm HTTPS communication status between nodes (might be with mapreduce/DFS).

...