How do I customize data placement on DataNodes (DN) of Hadoop cluster?

1,366 views

Let we change the default block size to 32 MB and replication factor to 1. Let Hadoop cluster consists of 4 DNs. Let input data size is 192 MB. Now I want to place data on DNs as following. DN1 and DN2 contain 2 blocks (32+32 = 64 MB) each and DN3 and DN4 contain 1 block (32 MB) each. Can it be possible? How to accomplish it?

posted Oct 27, 2015 by Sudhakar Singh

Looking for an answer? Promote on:

May be Using rack concept might work

commented Oct 27, 2015 by anonymous

Please Explain it in more detail?

commented Oct 27, 2015 by Sudhakar Singh

If the data is being written from one of the cluster nodes then preference would be given for local node irrespective of the Rack being configured.
If its written remotely(not from one of cluster nodes) then there is possibility of blocks getting distributed.
Further you can think of having some custom BlockPlacementPolicy by extending BlockPlacementPolicydefault and configuring "dfs.block.replicator.classname" if required.

commented Oct 30, 2015 by anonymous

Thanks for reply. Can you please provide exactly how to do this?

commented Oct 30, 2015 by Sudhakar Singh

public class MaxMinReducer extends Reducer { int max_sum=0; int mean=0; int count=0; Text max_occured_key=new Text(); Text mean_key=new Text("Mean : "); Text count_key=new Text("Count : "); int min_sum=Integer.MAX_VALUE; Text min_occured_key=new Text(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); count++; } if(sum < min_sum) { min_sum= sum; min_occured_key.set(key); } if(sum > max_sum) { max_sum = sum; max_occured_key.set(key); } mean=max_sum+min_sum/count; } @Override protected void cleanup(Context context) throws IOException, InterruptedException { context.write(max_occured_key, new IntWritable(max_sum)); context.write(min_occured_key, new IntWritable(min_sum)); context.write(mean_key , new IntWritable(mean)); context.write(count_key , new IntWritable(count)); } }

How do I customize data placement on DataNodes (DN) of Hadoop cluster?

Your comment on this post:

Your answer

Preview