What is a block and block scanner in HDFS?

698 views

posted Jun 22, 2017 by Karthick.c

Looking for an answer? Promote on:

Similar Questions

+3 votes

Support multiple block placement policies for HDFS?

According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default.The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well.

For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy.But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS.

One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts?

0 votes

What happens to a read operation when the file is moved to trash in HDFS?

I have a basic question regarding the HDFS file read. I want to know what happens, when the following steps are followed:

Client opens the file for reading and starts reading the file.
In the meantime, someone deletes the file and file moves to the trash folder

Will Step 1. succeed? I feel, since the client has already opened the file and file still exists in .trash, the client should continue to read the file.

0 votes

How to get info about which data in hdfs or file system that a MapReduce job visits?

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

0 votes

Can anyone provide me steps to grant a particular user access equivalnent to that of "hdfs" user in hadoop.

The reason behind this is I want to have my custom user who can create anything on the entire hdfs file system (/).
I tried couple of links however, none of them were useful. Is there any way by adding/modifying some property tags I can do that ?

+4 votes

Add few record(s) to a Hive table or a HDFS file on a daily basis

My requirement is a typical Datawarehouse and ETL requirement. I need to accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table or file is not a big table ( approximately 10 records per day). I don't want to Partition the table / file.

In few articles It was being mentioned that we need to load to a staging table in Hive. And then insert like the below :

insert overwrite table finaltable select * from staging;

I am not getting this logic. How should I populate the staging table daily.

What is a block and block scanner in HDFS?

Your comment on this post:

Your answer

Preview