What skills to Learn to become Hadoop Admin

442 views

I would like to enter into Big Data world as Hadoop Admin and I have setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.I have installed the services like hive, oozie, zookeeper etc.

I have done a web log integration using flume and twitter sentiment analysis. I wanted to understand what are the other skills I should learn ?

posted Mar 7, 2015 by Sridharan

Looking for an answer? Promote on:

Similar Questions

+1 vote

How to learn hadoop mapReduce on mongodb in java

I would like to know if you have any examples or tutorials where I can learn hadoop mapReduce on mongodb in java?

+1 vote

If I want to become a Fullstack Scala Developer, which technology stack I should learn?

+2 votes

What is the best hardware configuration to run Hadoop?

+1 vote

What configuration parameters cause a Hadoop 2.x job to run on the cluster?

Assume I have a machine on the same network as a hadoop 2 cluster but separate from it.

My understanding is that by setting certain elements of the config file or local xml files to point to the cluster I can launch a job without having to log into the cluster, move my jar to hdfs and start the job from the clusters hadoop machine.

Does this work? What Parameters need I sat? Where is the jar file? What issues would I see if the machine is running Windows with cygwin installed?

0 votes

How to write a Job for importing Files from an external Rest API into Hadoop

I want to ask, what's the best way implementing a Job which is importing files into the HDFS?

I have an external System offering data accessible through a Rest API. My goal is to have a job running in Hadoop which is periodical (maybe started by chron?) looking into the Rest API if new data is available.

It would be nice if also this job could run on multiple data nodes. But in difference to all the MapReduce examples I found, is my job looking for new Data or changed data from an external interface and compares the data with existing one.

This is a conceptual example of the job:

The job ask the Rest API if there are new files
if so, the job imports the first file in the list
look if the file already exits
if not, the job imports the file
if yes, the job compares the data with the data already stored
if changed the job updates the file
if more file exits the job continues with 2 -
otherwise ends.

Can anybody give me a little help how to start (its my first job I write...) ?

...

What skills to Learn to become Hadoop Admin

Your comment on this post:

Your answer

Preview