top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Hadoop: Can RM ignore heartbeats?

+1 vote
392 views

I have a question about the ResourceManager behavior:

When the ResourceManager allocates a container, it takes some time before the NMToken is sent and then received by the ApplicationMaster.

During this time, it is possible to receive another heartbeat from the AM, equal to the last one (since the AM is not aware of the allocated resources).

Is there any policy in YARN that makes the RM aware of this and ignore this last heartbeat? I ask this because I would expect way more superfluous containers allocated, in comparison to the ones I can see from the logs.

posted Feb 24, 2015 by anonymous

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+1 vote

In RM side, it will keep tracking on resource request status, such as how many containers that AM has requested, how many containers that RM has assigned to AM, how many containers that is pending, etc.

In AM side, it is user's code. It should keep tracking on the similar resource request status, too. If AM keeps asking the resource, eventually it will hit the queue limit/user limit, and the resources will not be allocated. Also, the allocate is a blocking call.

It will always get something back (could be nothing, some of the request resources, all request resources). AM should use this information to update the resource request status.

answer Feb 24, 2015 by Kiran Kumar
Similar Questions
+3 votes

As I studied that data distribution, load balancing, fault tolerance are implicit in Hadoop. But I need to customize it, can we do that?

+1 vote

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

+2 votes

Let we change the default block size to 32 MB and replication factor to 1. Let Hadoop cluster consists of 4 DNs. Let input data size is 192 MB. Now I want to place data on DNs as following. DN1 and DN2 contain 2 blocks (32+32 = 64 MB) each and DN3 and DN4 contain 1 block (32 MB) each. Can it be possible? How to accomplish it?

+2 votes
public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

+1 vote

After upgraded to Hadoop 2 (yarn), I found that mapred.jobtracker.taskScheduler.maxRunningTasksPerJob no longer worked, right?

One workaround is to use queue to limit it, but its not easy to control it from job submitter.

Is there any way to limit the concurrent running mappers per job? Any documents or pointer?

...