What is Hadoop and Where did Hadoop come from?

2 Answers

Best answer

Hadoop: his open source software platform managed by the Apache Software Foundation has proven to be very helpful in storing and managing vast amounts of data cheaply and efficiently.

But what exactly is Hadoop, and what makes it so special? Basically, it's a way of storing enormous data sets across distributed clusters of servers and then running "distributed" analysis applications in each cluster.

It's designed to be robust, in that your Big Data applications will continue to run even when individual servers — or clusters — fail. And it's also designed to be efficient, because it doesn't require your applications to shuttle huge volumes of data across your network.

Hadoop is almost completely modular, which means that you can swap out almost any of its components for a different software tool. That makes the architecture incredibly flexible, as well as robust and efficient.

answer Apr 13, 2015 by Amit Kumar Pandey

In 2004, Google has introduced distribute file system known as Google File System (GFS) and Google MapReduce. It became popular afterwards 2008. Doug Cutting has developed Hadoop and its name is based on the toy of his child. Hadoop Distributed File System (HDFS) and MapReduce is inspired by Google file system and mapreduce. Now it is managed by Apache. http://hadoop.apache.org/

answer Apr 14, 2015 by Sudhakar Singh

Similar Questions

0 votes

Where can I find information on Hadoop Security Best Practices?

Can someone share the information of hadoop best practices or the link where can I find these?

+2 votes

How to find min, max and mean of wordcount from text file in hadoop mapreduce?

public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

0 votes

How to write a Job for importing Files from an external Rest API into Hadoop

I want to ask, what's the best way implementing a Job which is importing files into the HDFS?

I have an external System offering data accessible through a Rest API. My goal is to have a job running in Hadoop which is periodical (maybe started by chron?) looking into the Rest API if new data is available.

It would be nice if also this job could run on multiple data nodes. But in difference to all the MapReduce examples I found, is my job looking for new Data or changed data from an external interface and compares the data with existing one.

This is a conceptual example of the job:

The job ask the Rest API if there are new files
if so, the job imports the first file in the list
look if the file already exits
if not, the job imports the file
if yes, the job compares the data with the data already stored
if changed the job updates the file
if more file exits the job continues with 2 -
otherwise ends.

Can anybody give me a little help how to start (its my first job I write...) ?

+1 vote

How to stop a mapreduce job from terminal running on Hadoop Cluster?

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

Can we run mapreduce job from eclipse IDE on fully distributed mode hadoop cluster?

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

What is Hadoop and Where did Hadoop come from?

Your comment on this post:

2 Answers

Your comment on this answer:

Your comment on this answer:

Your answer

Preview