top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Issue about datanode memory usage with Hadoop?

+1 vote
1,002 views

I have a job running very slowly, when I examine cluster, I find my hdfs user using 170m swap though top command, thats user run datanode daemon, ps output show following info, there are two -Xmx value, and i do not know which value is the real ,1000m or 10240m

# ps -ef|grep 2853
root      2095  1937  0 15:06 pts/4    00:00:00 grep 2853
hdfs      2853     1  5 Nov07 ?        1-22:34:22 /usr/java/jdk1.7.0_45/bin/java -Dproc_datanode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-ch14.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Xmx10240m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/var/log/hadoop-hdfs/gc-ch14-datanode.log -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
posted Dec 10, 2013 by Garima Jain

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button
it depends. I think Oracle JDK applies the rightmost. Check this if it is helpful  http://stackoverflow.com/questions/2740725/duplicated-java-runtime-options-what-is-the-order-of-preference

Similar Questions
+1 vote

We are currently facing a frustrating hadoop streaming memory problem. our setup:

  • our compute nodes have about 7 GB OF RAM
  • hadoop streaming starts a bash script wich uses about 4 GB OF RAM
  • therefore it is only possible to start one and only ONE TASK PER NODE

out of the box each hadoop instance starts about 7 hadoop containers with default hadoop settings. each hadoop task forks a bash script that need about 4 GB of RAM, the first fork works, all following fail because THEY RUN OUT OF MEMORY. so what we are looking for is to LIMIT the number of containers TO ONLY ONE. so what we found on the internet:

  • yarn.scheduler.maximum-allocation-mb and mapreduce.map.memory.mb is set to values such that there is at most one container. this means, mapreduce.map.memory.mb must be MORE THAN HALF of the maximum memory (otherwise there will be multiple containers).

done right, this gives us one container per node. but it produces a new problem: since our java process is now using at least half of the max memory, our child (bash) process we fork will INHERIT THE PARENT MEMORY FOOTPRINT and since the memory used by our parent was more than half of total memory, WE RUN OUT OF MEMORY AGAIN. if we lower the map memory, hadoop will allocate 2 containers per node, which will run out of memory too.

since this problem is a blocker in our current project we are evaluating adapting the source code to solve this issue. as a last resort. any ideas on this are very much welcome.

+1 vote

I would like to use Perl on an embedded device, which only has 64MB of RAM.
Are there any tricks to reduce the memory usage of the perl interpreter?

+1 vote

I am used to old version where I could use the typical hadoop-examples-*.jar with wordcount, sort or any other.

Recently I am using the latest version (2.7.1) but as can I see, there is no this dot jar file with examples. After a search over the Internet, I saw that there is a hadoop-mapreduce-examples jar file. Where I can find usage help for this examples? Available examples, etc.

I am focused on the HiBench tool (which use those examples) that Im using to test some modifications on the source code of Hadoop.

+1 vote

We get a problem about enable UseNUMA flag for my hadoop framework.

We've tried to specify JVM flags during hadoop daemon's starts,

e.g. export HADOOP_NAMENODE_OPTS="-XX:UseNUMA -Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS",  
export HADOOP_SECONDARYNAMENODE_OPTS="-XX:UseNUMA -Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS", etc.  

But the ratio between local and remote memory access is 2:1, just remains as same as before.

Then we find that hadoop MapReduce start child JVM processes to run task in containers. So we passes -XX:UseNUMA to JVMs by set theting configuration parameter child.java.opts. But hadoop starts to throw ExitCodeExceptionException (exitCode=1), seems that hadoop does not support this JVM parameter.

What should we do to enable UseNUMA flag for my hadoop? Or what should we do to decrease the local/remote memory access in NUMA framework? Should we just change Hadoop script or resorts to source code? And how to do it?

The hadoop version is 2.6.0.

...