top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Performance issue when using "hdfs setfacl -R"?

+1 vote
292 views

We use "hdfs setfacl -R" for file ACL control. As the data directory is big with 60,000+ sub-directories and files, the command is very time-consuming. Seems it can not finish in hours, we can not image this command will cost several days.
Any settings can help improve this?

posted Jan 17, 2018 by anonymous

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

0 votes

Try increasing heap size of the client via HADOOP_CLIENT_OPTS. The default is 128M IIRCThis might improve the performance.You can bump it upto 1G.

answer Jan 17, 2018 by Anderson
Similar Questions
0 votes

I have a basic question regarding the HDFS file read. I want to know what happens, when the following steps are followed:

  1. Client opens the file for reading and starts reading the file.
  2. In the meantime, someone deletes the file and file moves to the trash folder

Will Step 1. succeed? I feel, since the client has already opened the file and file still exists in .trash, the client should continue to read the file.

0 votes

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

...