How to find min, max and mean of wordcount from text file in hadoop mapreduce?

3,214 views

public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

posted Oct 16, 2015 by Sathish

Looking for an answer? Promote on:

Let me understand
if your input is
high low medium high low high low large small medium

Then the Max is 3 (i.e. high as well as low is appearing three times), min is 1 (i.e. Large and small is appearing once) and mean is 2. But your logic does not seems to be doing the same. Please cross check the reduce function

commented Oct 16, 2015 by Salil Agrawal

You may not be handling the dupplicate min max key case check the following link
http://stackoverflow.com/questions/32964067/hadoop-word-count-and-get-the-minimum-occured-word

Look at second part of the answer :)

commented Oct 16, 2015 by Salil Agrawal

I tried in different ways...but i can't resolved .......actually i am new to hadoop...can you plz write the code ....

commented Oct 16, 2015 by Sathish

i already tried that code(http://stackoverflow.com/questions/32964067/hadoop-word-count-and-get-the-minimum-occured-word)....it can't works

commented Oct 16, 2015 by Sathish

I am just pasting the code from other site (as I shared the link) which deals duplicate keys and only min case using that you can write max as well as mean.

public class MaxReducer extends Reducer {
     int min_sum=Integer.MAX_VALUE;
     ArrayList<String> al = new ArrayList<String>();
     public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
           int sum = 0;

           for (IntWritable value : values) {
                 sum += value.get();
           }

           if(sum < min_sum) {
               min_sum= sum;
               al.clear();
               al.add(key)
           } else if(sum == min_sum){
           al.add(key);
           }

      }

     @Override
     protected void cleanup(Context context) throws IOException, InterruptedException {
          for(String value : al) {
           context.write(new Text(value) , new IntWritable(min_sum));
}
}

commented Oct 16, 2015 by Salil Agrawal

Actually that question asked by me in stackoverflow(http://stackoverflow.com/questions/32964067/hadoop-word-count-and-get-the-minimum-occured-word) ..no one resolved that question

commented Oct 16, 2015 by Sathish

This code spot should work

           if(sum < min_sum) {
               min_sum= sum;
               al.clear();
               al.add(key)
           } else if(sum == min_sum){
           al.add(key);
           }

commented Oct 16, 2015 by Salil Agrawal

i already tried above code....can't works....

commented Oct 16, 2015 by Sathish

Unfortunately I dont have the access to the test platform here, so you may need to debug at your end. The crux is this code where you need to clear the old list if sum is less then min_list followed by add and if same just add the key.

commented Oct 16, 2015 by Salil Agrawal

once you resolve the solution ...can you plz share that code.....

commented Oct 16, 2015 by Sathish

Sure I will :)

commented Oct 16, 2015 by Salil Agrawal

can anyone give me the solution?

commented Oct 20, 2015 by Sathish

How to find min, max and mean of wordcount from text file in hadoop mapreduce?

Your comment on this post:

Your answer

Preview