How can I track a job failure on node or list of nodes, using YARN APIs?

1,009 views

How can I track a job failure on node or list of nodes, using YARN apis. I could get the list of long running jobs, using yarn client API,Â but need to go further to AM, NM, task attempts for map or reduce.
Say, I have a job running for long,(about 4hours), might be caused of some task failures.

Please provide the sequence of APIs, or any reference.

posted Apr 6, 2015 by Majula Joshi

Looking for an answer? Promote on:

Similar Questions

+2 votes

Run my own application master on a specific node in a YARN cluster

First of all, I'm using Hadoop-2.6.0. I want to launch my own app master on a specific node in a YARN cluster in order to open a server on a predetermined IP address and port. To that end, I wrote a driver program in which I created a ResourceRequest object and called setResourceName method to set a hostname, and attached it to a ApplicationSubmissionContext object by callingsetAMContainerResourceRequest method.

I tried several times but couldn't launch the app master on a specific node. After searching code, I found that RMAppAttemptImpl invalidates what I've set in ResourceRequest as follows:

 // Currently, following fields are all hard code,
 // TODO: change these fields when we want to support
 // priority/resource-name/relax-locality specification for AM containers
 // allocation.
 appAttempt.amReq.setNumContainers(1);
 appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
 appAttempt.amReq.setResourceName(ResourceRequest.ANY);
 appAttempt.amReq.setRelaxLocality(true);

Is there another way to launch a container for an application master on a specific node in Hadoop-2.6.0?

+2 votes

Run arbitrary job (non-MR) on YARN ?

I happened to run into this interesting scenario:

I had some mahout seq2sparse jobs, originally I run them in parallel using the distributed mode. But because the input files are so small, running them locally actually is much faster. so I turned them to local mode. But I run 10 of these jobs in parallel, so when 10 mahout jobs are run together, everyone became very slow. Is there an existing code that takes a desired shell script, and possibly some archive files (could contain the jar file, or C++ --generated executable code). I understand that I could use yarn API to code such a thing, but it would be nice if I could just take it and run in shell..

+1 vote

How a job works in YARN/Map Reduce? like navigation path...

How a job works in YARN/Map Reduce? like navigation path.

Please check my understanding is right?

When the application or job or client starts, client communicate with Name node the application manager started on node (data node), Application manager communicates with Resource manager (on name node) to get resource.The resource are assigned to container. The job runs on Container which is JVM.

+1 vote

Getting times for all the jobs run on a YARN cluster

I'm trying to get all the start and finish times for all the run jobs on a yarn cluster.

yarn application -list -appStates ALL

Will get me most of the details of the jobs, but not the times. However, I can parse this for the application ids and then run

yarn application -status $ID

on each application id to get an output that I can parse for the time.

However this involves making lots of connections to yarn, so is relatively slow. Is there a single command I can use to get all this information?

+1 vote

How to stop a mapreduce job from terminal running on Hadoop Cluster?

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

...

How can I track a job failure on node or list of nodes, using YARN APIs?

Your comment on this post:

Your answer

Preview