Provisioning a physical host into an existing HDFS/Yarn cluster

312 views

I recently got a powerful physical host, usually I get used to provisioning the host with VMs and add them to my existing HDFS/Yarn cluster(consists of 300+ VMs).

Now I am exploring Docker based approach, so I want to know if there are some best practices I can follow down the path.

posted Jan 26, 2016 by anonymous

Looking for an answer? Promote on:

Similar Questions

+2 votes

Run my own application master on a specific node in a YARN cluster

First of all, I'm using Hadoop-2.6.0. I want to launch my own app master on a specific node in a YARN cluster in order to open a server on a predetermined IP address and port. To that end, I wrote a driver program in which I created a ResourceRequest object and called setResourceName method to set a hostname, and attached it to a ApplicationSubmissionContext object by callingsetAMContainerResourceRequest method.

I tried several times but couldn't launch the app master on a specific node. After searching code, I found that RMAppAttemptImpl invalidates what I've set in ResourceRequest as follows:

 // Currently, following fields are all hard code,
 // TODO: change these fields when we want to support
 // priority/resource-name/relax-locality specification for AM containers
 // allocation.
 appAttempt.amReq.setNumContainers(1);
 appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
 appAttempt.amReq.setResourceName(ResourceRequest.ANY);
 appAttempt.amReq.setRelaxLocality(true);

Is there another way to launch a container for an application master on a specific node in Hadoop-2.6.0?

+1 vote

Getting times for all the jobs run on a YARN cluster

I'm trying to get all the start and finish times for all the run jobs on a yarn cluster.

yarn application -list -appStates ALL

Will get me most of the details of the jobs, but not the times. However, I can parse this for the application ids and then run

yarn application -status $ID

on each application id to get an output that I can parse for the time.

However this involves making lots of connections to yarn, so is relatively slow. Is there a single command I can use to get all this information?

0 votes

HDFS and YARN security and interface impacts

We have an application that interfaces directly to HDFS and YARN (no MapReduce). It does not currently support any Hadoop security other than the insecure "trust the client" defaults. I've been doing some reading about Hadoop security, but it mostly assumes that applications will be MapReduce. For a "native" YARN/HDFS application, what changes if any must be made to the API calls to support Kerberos or other authentication?

Does it just happen automatically at the OS level using the authenticated user ID of the process? If there's a good reference I'd appreciate it.

+2 votes

HDFS - Consolidate 2 small volumes into 1 large volume

Is it possible to consolidate two small data volumes (500GB each) into a larger data volume (3TB)?

I'm thinking that as long as the block file names and metadata are unique, then I should be able to shut down the datanode and use something like tar or rsync to copy the contents of each small volume to the large volume.

Will this work?

+2 votes

Hadoop:Whats the best way to check the compression codec that an HDFS file was written with?

We use both Gzip and Snappy compression so I want a way to determine how a specific file is compressed. The closest I found is the GETCODEC but that relies on the file name suffix ... which dont exist since Reducers typically dont add a suffix to the filenames they create.

...

Provisioning a physical host into an existing HDFS/Yarn cluster

Your comment on this post:

Your answer

Preview