top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

+1 vote
406 views

First I use FileSystem to open a file in hdfs.

FSDataInputStream m_dis = fs.open(...);        

Second, read the data in m_dis to a byte array.

byte[] inputdata = new byte[m_dis.available()];  //m_dis.available = 47185920
m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);

The value returned by m_dis.read() is 131072(2^17), so the data after 131072 is missing. It seems that FSDataInputStream use short to manage its data which confused me a lot. The same code run well in hadoop1.2.1.

posted Mar 7, 2014 by Meenal Mishra

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+1 vote
 
Best answer

Semantic of read does not guarantee read as much as possible. you need to call read() many times or use readFully...

answer Mar 7, 2014 by Deepak Dasgupta
Yes It worked, thanks

Similar Questions
+1 vote

I am new to hadoop. I am using Hadoop2.5.2 and Yarn as MR. I would ask the two ports, M/R(v2) master port and DFS Master port that is to be configured in the Eclipse hadoop plugin view.

Which properties do these ports correspond to in the hadoop configuration files,eg, yarn-site.xml.

+1 vote

The hadoop document suggests that the following variables be set inorder for Hadoop to prioritize the client jars over the Hadoop jars , however , I am not sure how to set them can someone please tell me how to set these .

HADOOP_USER_CLASSPATH_FIRST=TRUE and HADOOP_CLASSPATH=...:hadoop-examples-1.x.x.jar to run their target examples jar, and add the following configuration in mapred-site.xml to make the processes in YARN containers pick this jar as well.

http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html

+1 vote

The original local file has execution permission, and then it was distributed to multiple nodemanager nodes with Distributed Cache feature of Hadoop-2.2.0, but the distributed file has lost the execution permission.

However I did not encounter such issue in Hadoop-1.1.1.

Why this happened? Some changes about dfs.umask option or related staffs?

+1 vote

I currently have a hadoop 2.0 cluster in production, I want to upgrade to latest release.
current version: hadoop version Hadoop 2.0.0-cdh4.6.0

Cluster has the following services:
hbase hive hue impala mapreduce oozie sqoop zookeeper

Can someone point me to how to upgrade hadoop from 2.0 to hadoop 2.4.0?

+1 vote

We plan to migrate a 30 nodes hadoop 1.0.1 cluster to the version 2.3.0. We dont have extra machines to setup a separate new cluster, thus hope to do an in-place migration by replacing the components on the existing computers. So the questions are:

1) Is it possible to do an in-place migration, while keeping all data in HDFS safely?
2) If it is yes, is there any doc/guidance to do this?
3) Is the 2.0.3 MR API binary compatible with the one of 1.0.1?

...