top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Can we control data distribution and load balancing in Hadoop Cluster?

+3 votes
561 views

As I studied that data distribution, load balancing, fault tolerance are implicit in Hadoop. But I need to customize it, can we do that?

posted May 3, 2015 by Sudhakar Singh

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button
Your question is very vague. Can you give us more details about the problem you are trying to solve?
The input data sets in HDFS breaks it in blocks of default size 128 MB and replicate it by default replication factor 3. It also balance load by transfering job of failed or busy nodes to free or active nodes. Can we manage how much data sets and load should assign to which node by ourselves.

Similar Questions
+1 vote

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

+2 votes

Let we change the default block size to 32 MB and replication factor to 1. Let Hadoop cluster consists of 4 DNs. Let input data size is 192 MB. Now I want to place data on DNs as following. DN1 and DN2 contain 2 blocks (32+32 = 64 MB) each and DN3 and DN4 contain 1 block (32 MB) each. Can it be possible? How to accomplish it?

+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+2 votes
public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

+1 vote

In xmls configuration file of Hadoop-2.x, "mapreduce.input.fileinputformat.split.minsize" is given which can be set but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file. I need to set it in my mapreduce code.

...