I am working on a home-grown Hadoop installation. I've managed to get new nodes rolled out, it takes time as we have other dependencies. One item I've not been able figure out is where to set the HADOOP_HOME_DIR variable, so I can store the actual configuration for each node separate from the binary tree.
Can anyone point me to where this gets set properly? We have an init.d script that starts the services on the master, which calls out to the slaves (as user "hadoop") -- but I'm guessing the variable can be started there, exported and inherited -- but perhaps it may be more proper to set in ~hadoop/conf/hadoop-env.sh.
The idea is to enable me to more easily roll out slaves, perhaps using Puppet, so that the CONF and LOGS directories are separate -- it's easier to manage that way.