top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Best practice of migrating hadoop 1.0.1 to hadoop 2.2.3

+1 vote
847 views

We plan to migrate a 30 nodes hadoop 1.0.1 cluster to the version 2.3.0. We dont have extra machines to setup a separate new cluster, thus hope to do an in-place migration by replacing the components on the existing computers. So the questions are:

1) Is it possible to do an in-place migration, while keeping all data in HDFS safely?
2) If it is yes, is there any doc/guidance to do this?
3) Is the 2.0.3 MR API binary compatible with the one of 1.0.1?

posted Mar 6, 2014 by Amit Mishra

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

2 Answers

+1 vote

1) Is it possible to do an in-place migration, while keeping all data in HDFS safely?
yes. stop the HDFS firstly, then run "start-dfs.sh -upgrade"

2) If it is yes, is there any doc/guidance to do this?
you just want a HDFS upgrade, so I dont think there are some useful doc.

3) Is the 2.0.3 MR API binary compatible with the one of 1.0.1?
Not much compatible of the FileSystem API. and there are some new HDFS configurations and some deprecates.

answer Mar 6, 2014 by Ankit
0 votes

Refer to the following links for reference these should be helpful:

http://www.michael-noll.com/blog/2011/08/23/performing-an-hdfs-upgrade-of-an-hadoop-cluster/
http://wiki.apache.org/hadoop/Hadoop_Upgrade

Hadoop version used in the doc may be different from yours, but they are good references to understand the basic flow. I would suggest create a test cluster that could mimic your production environment, try it out on the test cluster before on the production and backup your namenode meta data, which may help you to recover.

answer Mar 6, 2014 by Sheetal Chauhan
Similar Questions
+4 votes

I want to know while upgrading/migrating from Apache Hadoop 1.x to 2.x(MRv2YARN) in a production cluster of several nodes is there any *ANTICIPATED DOWNTIME* that one needs to be aware of?

+3 votes

I am trying to access a hadoop 1 installation via the hadoop 2.2.0 command line tools. I am wondering if this is possible at all?

From hadoop 1 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
Found 2 items
drwxr-xr-x - cs supergroup 0 2014-02-01 08:18 /tmp
drwxr-xr-x - cs supergroup 0 2014-02-01 08:19 /user

From hadoop 2.2.0 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
ls: Failed on local exception: java.io.EOFException; Host Details : 
local host is: "i7/127.0.1.1"; destination host is: "localhost":9000;

I am trying to find this information via a web-search, but up to now no success.

+1 vote

The original local file has execution permission, and then it was distributed to multiple nodemanager nodes with Distributed Cache feature of Hadoop-2.2.0, but the distributed file has lost the execution permission.

However I did not encounter such issue in Hadoop-1.1.1.

Why this happened? Some changes about dfs.umask option or related staffs?

0 votes

I had a quick google and can't find any documentation on rolling upgrade, does anyone know how to upgrade from Hadoop 2.3 to 2.4?

+1 vote

I currently have a hadoop 2.0 cluster in production, I want to upgrade to latest release.
current version: hadoop version Hadoop 2.0.0-cdh4.6.0

Cluster has the following services:
hbase hive hue impala mapreduce oozie sqoop zookeeper

Can someone point me to how to upgrade hadoop from 2.0 to hadoop 2.4.0?

...