top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

What is Hadoop Map Reduce? How it works?

+1 vote
420 views
What is Hadoop Map Reduce? How it works?
posted Nov 18, 2014 by Amit Kumar Pandey

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+2 votes
 
Best answer

MAP Reduce is a framework initially created by Google read this white paper:
Map Reduce White Paper

It was developed for processing large amount of data (usually stored in a distributed file system like HDFS) to obtain useful pattern from it.

Hadoop is nothing but a collection of individual computers which use commodity hardware for storing large amount of data(usually big data) whose size is larger than a single computer's memory

For example,your computer has a memory of 500gb and mine another 500gb(imagine they are free) and you are provided with a data source of size 800gb

None of our systems are individually capable of holding this data so our idea is to combine the systems to get a total size of 1000gb

Now our data gets stored in this combined distributed file system known as hdfs-hadoop distributed file system

Now we may require to process our data like finding some useful pattern from it for that we use hadoop map reduce as an analytical platform

For more to know about how it works:

answer Nov 24, 2014 by Krishnan Goskan
Similar Questions
+1 vote

How a job works in YARN/Map Reduce? like navigation path.

Please check my understanding is right?

When the application or job or client starts, client communicate with Name node the application manager started on node (data node), Application manager communicates with Resource manager (on name node) to get resource.The resource are assigned to container. The job runs on Container which is JVM.

+2 votes

I have a set of input files which are going through changes. Is there any way by which we can run a Map reduce program which caches results.

Also, whenever there is any change to the input files the Map Reduce program automatically runs again and the resultset is altered according to changes to input files?

Can we use MR to approach this dynamically ?

0 votes

I have a system where files are coming in hdfs at regular intervals and I perform an operation everytime the directory size goes above a particular point.

My Question is that when I submit a map reduce job, would it only work on the files present at that point?

+2 votes

I need your help in writing the map reduce program in Java. I am creating a mapper and reducer classes for reading and processing a log file. I also have many other class files which acts as supporting classes to mapper and will be instantiated from mapper class within the map function.

PROBLEM STATEMENT :
Since there are 20 other objects which will be instantiated from mapper class within the map function, we think this could create a performance hit because of multiple object creation .

Please let us know what could be best approach/design to instantiate these 20 classes from Mapper class without compromising on the performance.

Your suggestions/comments are welcome.

+3 votes

I am looking to the Yarn mapreduce internals to try to understand how reduce tasks know which partition of the map output they should read. Even, when they re-execute after a crash?

I am also looking to the mapreduce source code. Is there any class that I should look to try to understand this question?

...