What is Hadoop Map Reduce? How it works?

1 Answer

Best answer

MAP Reduce is a framework initially created by Google read this white paper:
Map Reduce White Paper

It was developed for processing large amount of data (usually stored in a distributed file system like HDFS) to obtain useful pattern from it.

Hadoop is nothing but a collection of individual computers which use commodity hardware for storing large amount of data(usually big data) whose size is larger than a single computer's memory

For example,your computer has a memory of 500gb and mine another 500gb(imagine they are free) and you are provided with a data source of size 800gb

None of our systems are individually capable of holding this data so our idea is to combine the systems to get a total size of 1000gb

Now our data gets stored in this combined distributed file system known as hdfs-hadoop distributed file system

Now we may require to process our data like finding some useful pattern from it for that we use hadoop map reduce as an analytical platform

For more to know about how it works:

answer Nov 24, 2014 by Krishnan Goskan

Similar Questions

+1 vote

How a job works in YARN/Map Reduce? like navigation path...

How a job works in YARN/Map Reduce? like navigation path.

Please check my understanding is right?

When the application or job or client starts, client communicate with Name node the application manager started on node (data node), Application manager communicates with Resource manager (on name node) to get resource.The resource are assigned to container. The job runs on Container which is JVM.

+2 votes

Hadoop: Map Reduce in Cache

I have a set of input files which are going through changes. Is there any way by which we can run a Map reduce program which caches results.

Also, whenever there is any change to the input files the Map Reduce program automatically runs again and the resultset is altered according to changes to input files?

Can we use MR to approach this dynamically ?

0 votes

When I submit a map reduce job, would it only work on the files present at that point?

I have a system where files are coming in hdfs at regular intervals and I perform an operation everytime the directory size goes above a particular point.

My Question is that when I submit a map reduce job, would it only work on the files present at that point?

+2 votes

Query regarding the object creation in map reduce code

I need your help in writing the map reduce program in Java. I am creating a mapper and reducer classes for reading and processing a log file. I also have many other class files which acts as supporting classes to mapper and will be instantiated from mapper class within the map function.

PROBLEM STATEMENT :
Since there are 20 other objects which will be instantiated from mapper class within the map function, we think this could create a performance hit because of multiple object creation .

Please let us know what could be best approach/design to instantiate these 20 classes from Mapper class without compromising on the performance.

Your suggestions/comments are welcome.

+3 votes

Hadoop: How reduce tasks know which partition they should read?

I am looking to the Yarn mapreduce internals to try to understand how reduce tasks know which partition of the map output they should read. Even, when they re-execute after a crash?

I am also looking to the mapreduce source code. Is there any class that I should look to try to understand this question?

What is Hadoop Map Reduce? How it works?

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview