What is Hadoop?
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
It is part of the Apache project sponsored by the Apache Software Foundation.
Apache Hadoop has two main subprojects:
MapReduce - The framework that understands and assigns work to the nodes in a cluster.
HDFS - A file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes
Hadoop changes the economics and the dynamics of large scale computing. Its impact can be boiled down to four salient characteristics.
Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes.
Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure.
Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts.
Any of these parts (also called fragments or blocks) can be run on any node in the cluster.
The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.
The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop can also work with BSD and OS X.
Video Tutorial for What is Hadoop?