top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Hadoop: HDFS short-circuit reads

+2 votes
644 views

YARN application would benefit from maximal bandwidth on HDFS reads. But I'm unclear on how short-circuit reads are enabled. Are they on by default?

Can our application check programmatically to see if the short-circuit read is enabled?

posted Dec 17, 2013 by Sumit Pokharna

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+2 votes

Short-circuit reads are not on by default. The documentation page you linked to at hadoop.apache.org contains all of the information you need to enable them though.

Regarding checking status of short-circuit read programmatically, here are a few thoughts on this:
Your application could check Configuration for the dfs.client.read.shortcircuit key. This will tell you at a high level if the feature is enabled. However, note that the feature needs to be turned on in configuration for both the DataNode and the HDFS client process. Depending on the details of the deployment, the DataNode and the client might be using different configuration files.

This tells you if the feature is enabled, but it doesnt necessarily tell you if youre really going to get short-circuit reads when you open the file. There might not be a local replica for the block, in which case the read would fall back to the typical remote read behavior anyway.

Depending on what your application wants to achieve, you might also be interested in looking at the FileSystem.listLocatedStatus API to query information about blocks and the corresponding locations of replicas. Applications like MapReduce use this information to try to schedule their work for optimal locality. Short-circuit reads then become a further optimization on top of the gains already achieved by locality. Hope this helps.

answer Dec 17, 2013 by Satish Mishra
Similar Questions
0 votes

The reason behind this is I want to have my custom user who can create anything on the entire hdfs file system (/).
I tried couple of links however, none of them were useful. Is there any way by adding/modifying some property tags I can do that ?

+2 votes

Is there a way to mount HDFS directly on a Linux and Windows client? I believe I read something about there being some limitations, but that there is possibly a FUSE solution. Any information on this (with links to a how-to) would be greatly appreciated.

0 votes

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

...