top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

How to set hadoop.tmp.dir if I have multiple disks per node?

+1 vote
1,221 views

I have ten disks per node,and I don't know what value I should set to "hadoop.tmp.dir". Some said this property refers to a location in local disk while some other said it refers to a directory in HDFS. I am confused, who can explain it ?

I want to spread I/O since I have ten disks per node, so should I set a comma-separated list of directories (which are on different disks) to "hadoop.tmp.dir" ?

posted Dec 16, 2013 by Sheetal Chauhan

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button
Make sure to also set mapred.local.dir to the same set of output directories, this is were the intermediate key-value pairs are stored!

2 Answers

+2 votes

hadoop.tmp.dir is a directory created on local file system. For example if you have set hadoop.tmp.dir property to /home/training/hadoop

This directory will be created when you format the namenode by running the command
hadoop namenode -format

When you open this folder you will see two subfolders dfs and mapred. The /home/training/hadoop/mapred folder will be on HDFS also

Hope this helps

answer Dec 16, 2013 by Deepankar Dubey
+1 vote

You can set the hadoop tmp dir to a directory or a disk you can mount the disk and put path of that to the configuration file.

link /mnt

and you should set right permission for the mounted disk.

answer Dec 16, 2013 by Sonu Jindal
Similar Questions
+2 votes

I have following queries with hadoop, please help me?
1. The size of mapred.local.dir is big(30 GB), how many methods could clean it correctly?
2. For logs of NameNode/DataNode/JobTracker/TaskTracker, are they all rolling type log? Whats their max size? I can not find the specific settings for them in log4j.properties.
3. I find the size of dfs.name.dir and dfs.data.dir is very big now, are there any files under them could be removed actually? Or all files under the two folders could not be removed at all?

+2 votes

Did any one got these error before, please help

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: xxxxx.com:50010:DataXceiver error processing WRITE_BLOCK operation  src: /xxxxxxxx:39000 dst: /xxxxxx:50010

java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:167)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
at java.lang.Thread.run(Thread.java:745)
2015-01-11 04:13:21,846 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 657ms (threshold=300ms)
+1 vote

After upgraded to Hadoop 2 (yarn), I found that mapred.jobtracker.taskScheduler.maxRunningTasksPerJob no longer worked, right?

One workaround is to use queue to limit it, but its not easy to control it from job submitter.

Is there any way to limit the concurrent running mappers per job? Any documents or pointer?

+2 votes

I see that we can set job priority on a hadoop job. I have been trying to do it using the following command.

hadoop job -set-priority job-id VERY_LOW

It does not seem to be working.. after that I noticed the following http://archive.cloudera.com/cdh/3/hadoop/capacity_scheduler.html

says that the job-priority on a queue is disabled by default. I would like to enable it. Googleing is not help please suggest how to proceed my hadoop version is Hadoop 2.3.0-cdh5.1.0

+1 vote

I have two machine,one is master and another is slave, I want to know how to configure heartbeat of hadoop 2.2.0,which file will be modified?

...