top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

hadoop: Is there any way to limit the concurrent running mappers per job?

+1 vote
1,000 views

After upgraded to Hadoop 2 (yarn), I found that mapred.jobtracker.taskScheduler.maxRunningTasksPerJob no longer worked, right?

One workaround is to use queue to limit it, but its not easy to control it from job submitter.

Is there any way to limit the concurrent running mappers per job? Any documents or pointer?

posted Apr 20, 2015 by Parveen

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+1 vote

In Hadoop-2.x it is as:
mapreduce.jobtracker.taskscheduler.maxrunningtasks.perjob
The maximum number of running tasks for a job before it gets preempted. No limits if undefined.

You can see it from here:
https://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

answer Apr 22, 2015 by Sudhakar Singh
Similar Questions
+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

+2 votes

I am using containerLaunchContext.setCommands() to add different commands that I wanted to run on container. But only first command is getting execute.Is there is something else I need to do?

List commands = new ArrayList();commands.add(cmd1);commands.add(cmd2);

I can see only cmd1 is getting executed.

+1 vote

In xmls configuration file of Hadoop-2.x, "mapreduce.input.fileinputformat.split.minsize" is given which can be set but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file. I need to set it in my mapreduce code.

+3 votes

Date date; long start, end; // for recording start and end time of job
date = new Date(); start = date.getTime(); // starting timer

job.waitForCompletion(true)

date = new Date(); end = date.getTime(); //end timer
log.info("Total Time (in milliseconds) = "+ (end-start));
log.info("Total Time (in seconds) = "+ (end-start)*0.001F);

I am not sure this is the correct way to find. Is there any other method or API to find the execution time of a MapReduce job?

...