top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

HDFS - Consolidate 2 small volumes into 1 large volume

+2 votes
378 views

Is it possible to consolidate two small data volumes (500GB each) into a larger data volume (3TB)?

I'm thinking that as long as the block file names and metadata are unique, then I should be able to shut down the datanode and use something like tar or rsync to copy the contents of each small volume to the large volume.

Will this work?

posted Oct 21, 2014 by anonymous

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+1 vote

Yes, you can.

Stop the cluster, change your hdfs-site.xml on your datanode, (dfs.datanode.dir) to the large volume, copy two small data volumes to the large volumes, which was configured on above, start cluster and you are done.

answer Oct 22, 2014 by Garima Jain
Similar Questions
+2 votes

I am writing temp files to HDFS with replication=1, so I expect the blocks to be stored on the writing node. Are there any tips, in general, for optimizing write performance to HDFS? I use 128K buffers in the write() calls. Are there any parameters that can be set on the connection or in HDFS configuration to optimize this use pattern?

0 votes

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

0 votes

I have a basic question regarding the HDFS file read. I want to know what happens, when the following steps are followed:

  1. Client opens the file for reading and starts reading the file.
  2. In the meantime, someone deletes the file and file moves to the trash folder

Will Step 1. succeed? I feel, since the client has already opened the file and file still exists in .trash, the client should continue to read the file.

...