top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

What configuration parameters cause a Hadoop 2.x job to run on the cluster? [CLOSED]

+1 vote
479 views

Assume I have a machine on the same network as a hadoop 2 cluster but separate from it.

My understanding is that by setting certain elements of the config file or local xml files to point to the cluster I can launch a job without having to log into the cluster, move my jar to hdfs and start the job from the clusters hadoop machine.

Does this work? What Parameters need I sat? Where is the jar file? What issues would I see if the machine is running Windows with cygwin installed?

closed with the note: Problem Solved
posted Apr 25, 2014 by Luv Kumar

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button
What version of Hadoop you are using? (YARN or no YARN)

To answer your question; Yes its possible and simple. All you need to to is to have Hadoop JARs on the classpath with relevant configuration files on the same classpath pointing to the Hadoop cluster. Most often people simply copy core-site.xml, yarn-site.xml etc from the actual cluster to the application classpath and then you can run it straight from IDE.

Not a windows user so not sure about that second part of the question.
Thank you for your answer
1) I am using YARN
2) So presumably dropping core-site.xml, yarn-site into user.dir works do I need mapred-site.xml as well?

Yes, if you are running MR

Similar Questions
+1 vote

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

According to the book "Hadoop; The Definitive Guide", it is possible to use "-D property=value" to
override any default or site property in the configuration.

I gave it shot and it is true. The property specified with "-D" is ignored.

Then I put the property in an xml file and use "-conf xml_name" on the command line. But still I cannot
override the property.

The only way to override the default property is to get a Configuration reference in the code and set the property via the reference. But that is not convenient as I need to recompile the code each time I change the property.

Now the question is what is the right way to customize the configuration for a job?

...