top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

What is a block and block scanner in HDFS?

+1 vote
457 views
What is a block and block scanner in HDFS?
posted Jun 22, 2017 by Karthick.c

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+3 votes

According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default.The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well.

For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy.But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS.

One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts?

0 votes

I have a basic question regarding the HDFS file read. I want to know what happens, when the following steps are followed:

  1. Client opens the file for reading and starts reading the file.
  2. In the meantime, someone deletes the file and file moves to the trash folder

Will Step 1. succeed? I feel, since the client has already opened the file and file still exists in .trash, the client should continue to read the file.

0 votes

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

0 votes

The reason behind this is I want to have my custom user who can create anything on the entire hdfs file system (/).
I tried couple of links however, none of them were useful. Is there any way by adding/modifying some property tags I can do that ?

+4 votes

My requirement is a typical Datawarehouse and ETL requirement. I need to accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table or file is not a big table ( approximately 10 records per day). I don't want to Partition the table / file.

In few articles It was being mentioned that we need to load to a staging table in Hive. And then insert like the below :

insert overwrite table finaltable select * from staging;

I am not getting this logic. How should I populate the staging table daily.

...