top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

How to include hadoop jars in oozie package

+2 votes
243 views

I am trying to build a oozie-4.2.0 with hadop-2.7.1 version, i am using makedistro.sh script to build it but when i extract oozie.war file i dont see hadoop jars, what is the right way to build oozie to include hadoop jar in the oozie-distro?

./mkdistro.sh assembly:single -P hadoop-2 -D javaVersion=1.7 -D targetJavaVersion=1.7 -D skipTests -D includeHadoopJars

posted May 18, 2016 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+1 vote

I would like to know if you have any examples or tutorials where I can learn hadoop mapReduce on mongodb in java?

+2 votes
public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

+1 vote

I have a file containing one line for each edge in the graph with two vertex ids (source & sink).
sample:

1  2 (here 1 is source and 2 is sink node for the edge)
1  5
2  3
4  2
4  3

I want to assign a unique Id (Long value )to each edge i.e for each line of the file. How to ensure assignment of unique value in distributed mapper process?

Note : File size is large, so using only one reducer is not feasible.

+3 votes

In KNN like algorithm we need to load model Data into cache for predicting the records.

Here is the example for KNN.

So if the model will be a large file say1 or 2 GB we will be able to load them into Distributed cache.

The one way is to split/partition the model Result into some files and perform the distance calculation for all records in that file and then find the min ditance and max occurance of classlabel and predict the outcome.

How can we parttion the file and perform the operation on these partition ?

ie 1 record  parttition1,partition2,.... 2nd record  parttition1,partition2,... 

This is what came to my thought. Is there any further way. Any pointers would help me.

+1 vote

It was very interesting that Hadoop could work with Docker and doing some trial on patch of YARN-1964.

I applied patch yarn-1964-branch-2.2.0-docker.patch of jira YARN-1964 on branch 2.2 and am going to install a Hadoop cluster using the new generated tarball including the patch.

Then, I think I can use DockerContainerExecutor, but I do not know much details on the usage and have following questions:

  1. After installation, Whats the detailed config steps to adopt DockerContainerExecutor?

  2. How to verify whether a MR task is really launched in Docker container not Yarn container?

  3. WHICH HADOOP BRANCH WILL OFFICIALLY INCLUDE DOCKER SUPPORT?

...