Hadoop/Yarn: Running multiple commands on container

I am using containerLaunchContext.setCommands() to add different commands that I wanted to run on container. But only first command is getting execute.Is there is something else I need to do?

List commands = new ArrayList();commands.add(cmd1);commands.add(cmd2);

I can see only cmd1 is getting executed.

1 Answer

I found that UnixShellScriptBuilder#command just concatenates each commands with space, not with ";".
Therefore, you need to suffix ";" after commands you'd like to execute.

UnixShellScriptBuilder {
 @Override
 public void command(List command) {
 line("exec /bin/bash -c "", StringUtils.join(" ", command), """);
 }
}

answer Feb 18, 2014 by Majula Joshi

Similar Questions

+1 vote

How to stop a mapreduce job from terminal running on Hadoop Cluster?

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

Hadoop YARN 2.2.0 Streaming Memory Limitation?

We are currently facing a frustrating hadoop streaming memory problem. our setup:

our compute nodes have about 7 GB OF RAM
hadoop streaming starts a bash script wich uses about 4 GB OF RAM
therefore it is only possible to start one and only ONE TASK PER NODE

out of the box each hadoop instance starts about 7 hadoop containers with default hadoop settings. each hadoop task forks a bash script that need about 4 GB of RAM, the first fork works, all following fail because THEY RUN OUT OF MEMORY. so what we are looking for is to LIMIT the number of containers TO ONLY ONE. so what we found on the internet:

yarn.scheduler.maximum-allocation-mb and mapreduce.map.memory.mb is set to values such that there is at most one container. this means, mapreduce.map.memory.mb must be MORE THAN HALF of the maximum memory (otherwise there will be multiple containers).

done right, this gives us one container per node. but it produces a new problem: since our java process is now using at least half of the max memory, our child (bash) process we fork will INHERIT THE PARENT MEMORY FOOTPRINT and since the memory used by our parent was more than half of total memory, WE RUN OUT OF MEMORY AGAIN. if we lower the map memory, hadoop will allocate 2 containers per node, which will run out of memory too.

since this problem is a blocker in our current project we are evaluating adapting the source code to solve this issue. as a last resort. any ideas on this are very much welcome.

+1 vote

hadoop: Is there any way to limit the concurrent running mappers per job?

After upgraded to Hadoop 2 (yarn), I found that mapred.jobtracker.taskScheduler.maxRunningTasksPerJob no longer worked, right?

One workaround is to use queue to limit it, but its not easy to control it from job submitter.

Is there any way to limit the concurrent running mappers per job? Any documents or pointer?

+1 vote

How can I track a job failure on node or list of nodes, using YARN APIs?

How can I track a job failure on node or list of nodes, using YARN apis. I could get the list of long running jobs, using yarn client API,Â but need to go further to AM, NM, task attempts for map or reduce.
Say, I have a job running for long,(about 4hours), might be caused of some task failures.

Please provide the sequence of APIs, or any reference.

+2 votes

Run my own application master on a specific node in a YARN cluster

First of all, I'm using Hadoop-2.6.0. I want to launch my own app master on a specific node in a YARN cluster in order to open a server on a predetermined IP address and port. To that end, I wrote a driver program in which I created a ResourceRequest object and called setResourceName method to set a hostname, and attached it to a ApplicationSubmissionContext object by callingsetAMContainerResourceRequest method.

I tried several times but couldn't launch the app master on a specific node. After searching code, I found that RMAppAttemptImpl invalidates what I've set in ResourceRequest as follows:

 // Currently, following fields are all hard code,
 // TODO: change these fields when we want to support
 // priority/resource-name/relax-locality specification for AM containers
 // allocation.
 appAttempt.amReq.setNumContainers(1);
 appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
 appAttempt.amReq.setResourceName(ResourceRequest.ANY);
 appAttempt.amReq.setRelaxLocality(true);

Is there another way to launch a container for an application master on a specific node in Hadoop-2.6.0?

Hadoop/Yarn: Running multiple commands on container

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview