Hadoop: Is it a good idea to delete / move the default configuration xml file ?

I have just realized that my implementation of hadoop-2.4.1 is pulling in all the default.xml files.

I have three copies of each in different directories, obviously at least one of those is on the class path.

Anyway with all the effort to set up a site, it seems strange to me that I would use settings I had no idea existed and that may not be how I would choose to set them up.

1 Answer

I recommend against deleting or moving *-default.xml, because these files may be supplying reasonable default values for configuration properties that you have not set in *-site.xml. We also put defaults into the code itself in case a configuration property is found to be completely missing, but I am not aware of any actual testing of deployments that have deleted *-default.xml.

answer Jul 21, 2014 by Meenal Mishra

Are not the *-default.xml files supposed to be inside the jars rather than loose files?

commented Jul 21, 2014 by anonymous

That's a good point. I am not sure how bare *-default.xml files would be showing up on a deployment outside the jars.

commented Jul 21, 2014 by anonymous

Similar Questions

+3 votes

Is it possible to access a hadoop 1 file system (hdfs) via the hadoop 2.2.0 command line tools?

I am trying to access a hadoop 1 installation via the hadoop 2.2.0 command line tools. I am wondering if this is possible at all?

From hadoop 1 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
Found 2 items
drwxr-xr-x - cs supergroup 0 2014-02-01 08:18 /tmp
drwxr-xr-x - cs supergroup 0 2014-02-01 08:19 /user

From hadoop 2.2.0 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
ls: Failed on local exception: java.io.EOFException; Host Details : 
local host is: "i7/127.0.1.1"; destination host is: "localhost":9000;

I am trying to find this information via a web-search, but up to now no success.

+1 vote

What configuration parameters cause a Hadoop 2.x job to run on the cluster?

Assume I have a machine on the same network as a hadoop 2 cluster but separate from it.

My understanding is that by setting certain elements of the config file or local xml files to point to the cluster I can launch a job without having to log into the cluster, move my jar to hdfs and start the job from the clusters hadoop machine.

Does this work? What Parameters need I sat? Where is the jar file? What issues would I see if the machine is running Windows with cygwin installed?

+1 vote

How to customize hadoop configuration for a job?

According to the book "Hadoop; The Definitive Guide", it is possible to use "-D property=value" to
override any default or site property in the configuration.

I gave it shot and it is true. The property specified with "-D" is ignored.

Then I put the property in an xml file and use "-conf xml_name" on the command line. But still I cannot
override the property.

The only way to override the default property is to get a Configuration reference in the code and set the property via the reference. But that is not convenient as I need to recompile the code each time I change the property.

Now the question is what is the right way to customize the configuration for a job?

+2 votes

What is the best hardware configuration to run Hadoop?

+3 votes

How to partition a file to smaller size for performing KNN in hadoop mapreduce

In KNN like algorithm we need to load model Data into cache for predicting the records.

Here is the example for KNN.

So if the model will be a large file say1 or 2 GB we will be able to load them into Distributed cache.

The one way is to split/partition the model Result into some files and perform the distance calculation for all records in that file and then find the min ditance and max occurance of classlabel and predict the outcome.

How can we parttion the file and perform the operation on these partition ?

ie 1 record  parttition1,partition2,.... 2nd record  parttition1,partition2,...

This is what came to my thought. Is there any further way. Any pointers would help me.

Hadoop: Is it a good idea to delete / move the default configuration xml file ?

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview