Why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

First I use FileSystem to open a file in hdfs.

FSDataInputStream m_dis = fs.open(...);

Second, read the data in m_dis to a byte array.

byte[] inputdata = new byte[m_dis.available()];  //m_dis.available = 47185920
m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);

The value returned by m_dis.read() is 131072(2^17), so the data after 131072 is missing. It seems that FSDataInputStream use short to manage its data which confused me a lot. The same code run well in hadoop1.2.1.

1 Answer

Best answer

Semantic of read does not guarantee read as much as possible. you need to call read() many times or use readFully...

answer Mar 7, 2014 by Deepak Dasgupta

Yes It worked, thanks

commented Mar 7, 2014 by anonymous

Similar Questions

+1 vote

Eclipse plugin for Hadoop2.5.2

I am new to hadoop. I am using Hadoop2.5.2 and Yarn as MR. I would ask the two ports, M/R(v2) master port and DFS Master port that is to be configured in the Eclipse hadoop plugin view.

Which properties do these ports correspond to in the hadoop configuration files,eg, yarn-site.xml.

+1 vote

How to set the environment variables in Hadoop 2.2.0

The hadoop document suggests that the following variables be set inorder for Hadoop to prioritize the client jars over the Hadoop jars , however , I am not sure how to set them can someone please tell me how to set these .

HADOOP_USER_CLASSPATH_FIRST=TRUE and HADOOP_CLASSPATH=...:hadoop-examples-1.x.x.jar to run their target examples jar, and add the following configuration in mapred-site.xml to make the processes in YARN containers pick this jar as well.

http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html

+1 vote

File Permission Issue using Distributed Cache of Hadoop-2.2.0

The original local file has execution permission, and then it was distributed to multiple nodemanager nodes with Distributed Cache feature of Hadoop-2.2.0, but the distributed file has lost the execution permission.

However I did not encounter such issue in Hadoop-1.1.1.

Why this happened? Some changes about dfs.umask option or related staffs?

+1 vote

How to upgrade hadoop from 2.0 to hadoop 2.4.0?

I currently have a hadoop 2.0 cluster in production, I want to upgrade to latest release.
current version: hadoop version Hadoop 2.0.0-cdh4.6.0

Cluster has the following services:
hbase hive hue impala mapreduce oozie sqoop zookeeper

Can someone point me to how to upgrade hadoop from 2.0 to hadoop 2.4.0?

+1 vote

Best practice of migrating hadoop 1.0.1 to hadoop 2.2.3

We plan to migrate a 30 nodes hadoop 1.0.1 cluster to the version 2.3.0. We dont have extra machines to setup a separate new cluster, thus hope to do an in-place migration by replacing the components on the existing computers. So the questions are:

1) Is it possible to do an in-place migration, while keeping all data in HDFS safely?
2) If it is yes, is there any doc/guidance to do this?
3) Is the 2.0.3 MR API binary compatible with the one of 1.0.1?

Why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview