top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Hadoop: calling mapreduce from webservice

+1 vote
1,283 views

This is just to check with you, if it is possible to call MR jobs from Java Webservices. If yes, then could you please help me by pointing to some resources/docs.

Actually, what I intend to do is create a Web UI with some functionality which would call MR jobs and present the result to the user in browser.

posted Apr 18, 2014 by Tarun Singhal

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button
Question: M/R jobs are supposed to run for a long time. They are essentially batch processes. Do you plan to keep the Web UI blocked for that while? Or are you looking for asynchronous invocation of the M/R job? Or are you thinking about building sort of an Admin UI (e.g. PigLipstick) What exactly is your requirement?
Yes. I intend to run the jobs asynchronously and show the status of the user submitted job as "running/completed" etc. and user will be able to submit new jobs simultaneously. I have not checked PigLipStick though.

1 Answer

+1 vote

As far as I know there is no API to kick of M/R jobs. There is for M/R v2, a REST API to get status of jobs: http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html#Mapreduce_Application_Master_Info_API

I would say that you have invoke M/R jobs in your middle tier or back-end, you have to implement a custom solution i.e. invoking the M/R jobs in standard way and then monitoring the status of the job and then update the UI asynchronously depending on which UI framework or web service implementation (e.g. WS-Addressing) you are using.

answer Apr 18, 2014 by anonymous
Play framework is reactive and uses push channels. It may be useful here if the UI has to be asynchronous and reactive.
Similar Questions
+2 votes
public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

+2 votes

Is it possible to run jobs on Hadoop in batch mode? I have 5 different datasets in HDFS and need to run the same MapReduce application on these datasets sets one after the other.

Right now I am doing it manually How can I automate this? How can I save the log of each execution in text files for later processing?

...