top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Hadoop: Is it a good idea to delete / move the default configuration xml file ?

0 votes
511 views

I have just realized that my implementation of hadoop-2.4.1 is pulling in all the default.xml files.

I have three copies of each in different directories, obviously at least one of those is on the class path.

Anyway with all the effort to set up a site, it seems strange to me that I would use settings I had no idea existed and that may not be how I would choose to set them up.

posted Jul 21, 2014 by Sanketi Garg

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+1 vote

I recommend against deleting or moving *-default.xml, because these files may be supplying reasonable default values for configuration properties that you have not set in *-site.xml. We also put defaults into the code itself in case a configuration property is found to be completely missing, but I am not aware of any actual testing of deployments that have deleted *-default.xml.

answer Jul 21, 2014 by Meenal Mishra
Are not the *-default.xml files supposed to be inside the jars rather than loose files?
That's a good point. I am not sure how bare *-default.xml files would be showing up on a deployment outside the jars.
Similar Questions
+3 votes

I am trying to access a hadoop 1 installation via the hadoop 2.2.0 command line tools. I am wondering if this is possible at all?

From hadoop 1 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
Found 2 items
drwxr-xr-x - cs supergroup 0 2014-02-01 08:18 /tmp
drwxr-xr-x - cs supergroup 0 2014-02-01 08:19 /user

From hadoop 2.2.0 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
ls: Failed on local exception: java.io.EOFException; Host Details : 
local host is: "i7/127.0.1.1"; destination host is: "localhost":9000;

I am trying to find this information via a web-search, but up to now no success.

+1 vote

Assume I have a machine on the same network as a hadoop 2 cluster but separate from it.

My understanding is that by setting certain elements of the config file or local xml files to point to the cluster I can launch a job without having to log into the cluster, move my jar to hdfs and start the job from the clusters hadoop machine.

Does this work? What Parameters need I sat? Where is the jar file? What issues would I see if the machine is running Windows with cygwin installed?

+1 vote

According to the book "Hadoop; The Definitive Guide", it is possible to use "-D property=value" to
override any default or site property in the configuration.

I gave it shot and it is true. The property specified with "-D" is ignored.

Then I put the property in an xml file and use "-conf xml_name" on the command line. But still I cannot
override the property.

The only way to override the default property is to get a Configuration reference in the code and set the property via the reference. But that is not convenient as I need to recompile the code each time I change the property.

Now the question is what is the right way to customize the configuration for a job?

+3 votes

In KNN like algorithm we need to load model Data into cache for predicting the records.

Here is the example for KNN.

So if the model will be a large file say1 or 2 GB we will be able to load them into Distributed cache.

The one way is to split/partition the model Result into some files and perform the distance calculation for all records in that file and then find the min ditance and max occurance of classlabel and predict the outcome.

How can we parttion the file and perform the operation on these partition ?

ie 1 record  parttition1,partition2,.... 2nd record  parttition1,partition2,... 

This is what came to my thought. Is there any further way. Any pointers would help me.

...