top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Hadoop: how to assign unique ID (Long Value) in mapper

+1 vote
334 views

I have a file containing one line for each edge in the graph with two vertex ids (source & sink).
sample:

1  2 (here 1 is source and 2 is sink node for the edge)
1  5
2  3
4  2
4  3

I want to assign a unique Id (Long value )to each edge i.e for each line of the file. How to ensure assignment of unique value in distributed mapper process?

Note : File size is large, so using only one reducer is not feasible.

posted Jun 25, 2015 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+2 votes

Does anyone knows how to ‘capture’ the exception which actually failed the job running on Mapper or Reducer at runtime? It seems Hadoop is designed to be fault tolerant that the failed jobs will be automatically rerun for a certain amount of times and won’t actually expose the real problem unless you look into the error log?

In my use case, I would like to capture the exception and make different response based on the type of the exception.

+2 votes

I declared a variable and incremented/modified it inside Mapper class. Now I need to use the modified value of that variable in Driver class. I declared a static variable inside Mapper class and its modified value works in Driver class when I run the code in Eclipse IDE. But after creating that code as a runable jar from Eclipse and run jar file as “$ hadoop jar filename.jar input output” modified value does not reflect (value is 0) in Driver class.

+2 votes

I am currently playing around with Hadoop and have some problems when trying to filter in the Reducer.

I extended the WordCount v1.0 example from the 2.7 MapReduce Tutorial with some additional functionality
and added the possibility to filter by the specific value of each key - e.g. only output the key-value pairs where [[ value > threshold ]].

Filtering Code in Reducer

for (IntWritable val : values) {
  sum += val.get();
}
if ( sum > threshold ) {
  result.set(sum);
  context.write(key, result);
}

For threshold smaller any value the above code works as expected and the output contains all key-value pairs. If I increase the threshold to 1 some pairs are missing in the output although the respective value would be larger than the threshold.

I tried to work out the error myself, but I could not get it to work as intended. I use the exact Tutorial setup with Oracle JDK 8 on a CentOS 7 machine.

As far as I understand the respective Iterable in the Reducer already contains all the observed values for a specific key. Why is it possible that I am missing some of these key-value pairs then? It only fails in very few cases. The input file is pretty large - 250 MB -

so I also tried to increase the memory for the mapping and reduction steps but it did not help ( tried a lot of different stuff without success )

Maybe someone already experienced similar problems / is more experienced than I am.

+1 vote

Recently I have set up Kerberos security for a Hadoop cluster and added a few data nodes to it. While running hdfs balancer, I found that Kerberos ticket is expired and balancer stop.

The Kerberos ticket has 1day lifetime with 7days max renewable lifetime. Are there any options to automatically renew the ticket while running balancer?

Or should I re-start it everyday?

+2 votes

I am trying to build a oozie-4.2.0 with hadop-2.7.1 version, i am using makedistro.sh script to build it but when i extract oozie.war file i dont see hadoop jars, what is the right way to build oozie to include hadoop jar in the oozie-distro?

./mkdistro.sh assembly:single -P hadoop-2 -D javaVersion=1.7 -D targetJavaVersion=1.7 -D skipTests -D includeHadoopJars

...