Hadoop: how to assign unique ID (Long Value) in mapper

455 views

I have a file containing one line for each edge in the graph with two vertex ids (source & sink).
sample:

1  2 (here 1 is source and 2 is sink node for the edge)
1  5
2  3
4  2
4  3

I want to assign a unique Id (Long value )to each edge i.e for each line of the file. How to ensure assignment of unique value in distributed mapper process?

Note : File size is large, so using only one reducer is not feasible.

posted Jun 25, 2015 by anonymous

Looking for an answer? Promote on:

Similar Questions

+2 votes

Hadoop: How to obtain the exception actually failed the job on Mapper or Reducer at runtime?

Does anyone knows how to ‘capture’ the exception which actually failed the job running on Mapper or Reducer at runtime? It seems Hadoop is designed to be fault tolerant that the failed jobs will be automatically rerun for a certain amount of times and won’t actually expose the real problem unless you look into the error log?

In my use case, I would like to capture the exception and make different response based on the type of the exception.

+2 votes

How to access value of variable in Driver class which has been declared and modified inside Mapper class?

I declared a variable and incremented/modified it inside Mapper class. Now I need to use the modified value of that variable in Driver class. I declared a static variable inside Mapper class and its modified value works in Driver class when I run the code in Eclipse IDE. But after creating that code as a runable jar from Eclipse and run jar file as “$ hadoop jar filename.jar input output” modified value does not reflect (value is 0) in Driver class.

+2 votes

Hadoop: Filtering by value in Reducer

I am currently playing around with Hadoop and have some problems when trying to filter in the Reducer.

I extended the WordCount v1.0 example from the 2.7 MapReduce Tutorial with some additional functionality
and added the possibility to filter by the specific value of each key - e.g. only output the key-value pairs where [[ value > threshold ]].

Filtering Code in Reducer

for (IntWritable val : values) {
  sum += val.get();
}
if ( sum > threshold ) {
  result.set(sum);
  context.write(key, result);
}

For threshold smaller any value the above code works as expected and the output contains all key-value pairs. If I increase the threshold to 1 some pairs are missing in the output although the respective value would be larger than the threshold.

I tried to work out the error myself, but I could not get it to work as intended. I use the exact Tutorial setup with Oracle JDK 8 on a CentOS 7 machine.

As far as I understand the respective IterableÂ in the Reducer already contains all the observed values for a specific key. Why is it possible that I am missing some of these key-value pairs then? It only fails in very few cases. The input file is pretty large - 250 MB -

so I also tried to increase the memory for the mapping and reduction steps but it did not help ( tried a lot of different stuff without success )

Maybe someone already experienced similar problems / is more experienced than I am.

+1 vote

How to renew Kerberos ticket to run balancer for long time?

Recently I have set up Kerberos security for a Hadoop cluster and added a few data nodes to it. While running hdfs balancer, I found that Kerberos ticket is expired and balancer stop.

The Kerberos ticket has 1day lifetime with 7days max renewable lifetime. Are there any options to automatically renew the ticket while running balancer?

Or should I re-start it everyday?

+2 votes

How to include hadoop jars in oozie package

I am trying to build a oozie-4.2.0 with hadop-2.7.1 version, i am using makedistro.sh script to build it but when i extract oozie.war file i dont see hadoop jars, what is the right way to build oozie to include hadoop jar in the oozie-distro?

./mkdistro.sh assembly:single -P hadoop-2 -D javaVersion=1.7 -D targetJavaVersion=1.7 -D skipTests -D includeHadoopJars

...

Hadoop: how to assign unique ID (Long Value) in mapper

Your comment on this post:

Your answer

Preview