top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Provisioning a physical host into an existing HDFS/Yarn cluster

+1 vote
169 views

I recently got a powerful physical host, usually I get used to provisioning the host with VMs and add them to my existing HDFS/Yarn cluster(consists of 300+ VMs).

Now I am exploring Docker based approach, so I want to know if there are some best practices I can follow down the path.

posted Jan 26, 2016 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+2 votes

First of all, I'm using Hadoop-2.6.0. I want to launch my own app master on a specific node in a YARN cluster in order to open a server on a predetermined IP address and port. To that end, I wrote a driver program in which I created a ResourceRequest object and called setResourceName method to set a hostname, and attached it to a ApplicationSubmissionContext object by callingsetAMContainerResourceRequest method.

I tried several times but couldn't launch the app master on a specific node. After searching code, I found that RMAppAttemptImpl invalidates what I've set in ResourceRequest as follows:

 // Currently, following fields are all hard code,
 // TODO: change these fields when we want to support
 // priority/resource-name/relax-locality specification for AM containers
 // allocation.
 appAttempt.amReq.setNumContainers(1);
 appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
 appAttempt.amReq.setResourceName(ResourceRequest.ANY);
 appAttempt.amReq.setRelaxLocality(true);

Is there another way to launch a container for an application master on a specific node in Hadoop-2.6.0?

+1 vote

I'm trying to get all the start and finish times for all the run jobs on a yarn cluster.

yarn application -list -appStates ALL

Will get me most of the details of the jobs, but not the times. However, I can parse this for the application ids and then run

yarn application -status $ID

on each application id to get an output that I can parse for the time.

However this involves making lots of connections to yarn, so is relatively slow. Is there a single command I can use to get all this information?

0 votes

We have an application that interfaces directly to HDFS and YARN (no MapReduce). It does not currently support any Hadoop security other than the insecure "trust the client" defaults. I've been doing some reading about Hadoop security, but it mostly assumes that applications will be MapReduce. For a "native" YARN/HDFS application, what changes if any must be made to the API calls to support Kerberos or other authentication?

Does it just happen automatically at the OS level using the authenticated user ID of the process? If there's a good reference I'd appreciate it.

+2 votes

Is it possible to consolidate two small data volumes (500GB each) into a larger data volume (3TB)?

I'm thinking that as long as the block file names and metadata are unique, then I should be able to shut down the datanode and use something like tar or rsync to copy the contents of each small volume to the large volume.

Will this work?

+2 votes

We use both Gzip and Snappy compression so I want a way to determine how a specific file is compressed. The closest I found is the GETCODEC but that relies on the file name suffix ... which dont exist since Reducers typically dont add a suffix to the filenames they create.

...