top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Query regarding the object creation in map reduce code

+2 votes
376 views

I need your help in writing the map reduce program in Java. I am creating a mapper and reducer classes for reading and processing a log file. I also have many other class files which acts as supporting classes to mapper and will be instantiated from mapper class within the map function.

PROBLEM STATEMENT :
Since there are 20 other objects which will be instantiated from mapper class within the map function, we think this could create a performance hit because of multiple object creation .

Please let us know what could be best approach/design to instantiate these 20 classes from Mapper class without compromising on the performance.

Your suggestions/comments are welcome.

posted Dec 5, 2014 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
0 votes

I have a system where files are coming in hdfs at regular intervals and I perform an operation everytime the directory size goes above a particular point.

My Question is that when I submit a map reduce job, would it only work on the files present at that point?

+1 vote

How a job works in YARN/Map Reduce? like navigation path.

Please check my understanding is right?

When the application or job or client starts, client communicate with Name node the application manager started on node (data node), Application manager communicates with Resource manager (on name node) to get resource.The resource are assigned to container. The job runs on Container which is JVM.

+2 votes

I have a set of input files which are going through changes. Is there any way by which we can run a Map reduce program which caches results.

Also, whenever there is any change to the input files the Map Reduce program automatically runs again and the resultset is altered according to changes to input files?

Can we use MR to approach this dynamically ?

+3 votes

I am looking to the Yarn mapreduce internals to try to understand how reduce tasks know which partition of the map output they should read. Even, when they re-execute after a crash?

I am also looking to the mapreduce source code. Is there any class that I should look to try to understand this question?

...