top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

How to set up a hadoop cluster?

+1 vote
345 views

I am trying to set up a hadoop cluster, I would like to know how many physical VMs are needed for this. . My main interest to measure the shuffle phase network traffic.

What is the basic requirement like namenode and data node .

posted Dec 8, 2013 by Sumit Pokharna

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+1 vote

Hadoop default replication factor is 3 and you can configure it in hdfs-default.xml, you should have one Master (NameNode and 2 Slaves (Data Nodes))

and please follow http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

answer Dec 10, 2013 by Anderson
Similar Questions
+1 vote

I want to set up a multinode hadoop cluser on vmware. Can anybody suggest some good material to do so.

I have tried these https://www.dropbox.com/s/05aurcp42asuktp/Chiu%20Hadoop%20Pig%20Install%20Instructions.docx instruction to setup single node hadoop cluster on vmware. N

ow can anybody help me in creating a multinode hadoop cluster ( 3 nodes) on vmware?

+3 votes

In my cluster ,I want to have multiusers for different purpose. The usual method is to add a user through the OS on Hadoop NameNode.

I notice the hadoop also support to LDAP, could I add user through LDAP instead through OS? So that if a user is authenticated by the LDAP ,who will also access the HDFS directory?

+2 votes

Would anyone here be willing to walk me through setting this stuff up on Amazon Web Services?

I need to run MongoDB on an EC2 instance and connect to an EMR Hadoop cluster for a project, but I have never used any of this stuff (Mongo/Hadoop/the connector/AWS) before so its a bit overwhelming. I have downloaded the connector from Github so far.

I believe I need to run "gradlew jar" to build the jars (not really sure what those do either), but after that I am a bit lost. Have been searching for about a week now, but I cant find a good step-by-step process for this.

Please help..

+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

...