Which demon is responsible for replication of data in Hadoop?

1,391 views

AHDFS

BTask Tracker

CJob Tracker

DName Node

EData Node

posted Nov 30, 2017 by anonymous

Looking for an answer? Promote on:

Similar Questions

0 votes

The archive file created in Hadoop always has the extension of

+2 votes

Hadoop: namenode doesn't update block locations when data directories of a datanode is changed?

I am running hadoop-2.4.0 cluster. Each datanode has 10 disks, directories for 10 disks are specified in dfs.datanode.data.dir.

A few days ago, I modified dfs.datanode.data.dir of a datanode () to reduce disks. so two disks were excluded from dfs.datanode.data.dir, after the datanode was restarted, I expected that the namenode would update block locations. In other words, I thought the namenode should remove from block locations associated with blocks which were stored on excluded disks, but the namenode didnt update the block locations...

In my understanding, datanode send a block report to the namenode when datanode start so the namenode should update block locations immediately.

Is a bug? Could anyone please explain?

+2 votes

How do I customize data placement on DataNodes (DN) of Hadoop cluster?

Let we change the default block size to 32 MB and replication factor to 1. Let Hadoop cluster consists of 4 DNs. Let input data size is 192 MB. Now I want to place data on DNs as following. DN1 and DN2 contain 2 blocks (32+32 = 64 MB) each and DN3 and DN4 contain 1 block (32 MB) each. Can it be possible? How to accomplish it?

+2 votes

Tips for optimizing HDFS writes with replication=1?

I am writing temp files to HDFS with replication=1, so I expect the blocks to be stored on the writing node. Are there any tips, in general, for optimizing write performance to HDFS? I use 128K buffers in the write() calls. Are there any parameters that can be set on the connection or in HDFS configuration to optimize this use pattern?

+3 votes

Can we control data distribution and load balancing in Hadoop Cluster?

As I studied that data distribution, load balancing, fault tolerance are implicit in Hadoop. But I need to customize it, can we do that?

Which demon is responsible for replication of data in Hadoop?

Your comment on this post: