top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

How to write a Job for importing Files from an external Rest API into Hadoop

0 votes
345 views

I want to ask, what's the best way implementing a Job which is importing files into the HDFS?

I have an external System offering data accessible through a Rest API. My goal is to have a job running in Hadoop which is periodical (maybe started by chron?) looking into the Rest API if new data is available.

It would be nice if also this job could run on multiple data nodes. But in difference to all the MapReduce examples I found, is my job looking for new Data or changed data from an external interface and compares the data with existing one.

This is a conceptual example of the job:

  • The job ask the Rest API if there are new files
  • if so, the job imports the first file in the list
  • look if the file already exits

  • if not, the job imports the file

  • if yes, the job compares the data with the data already stored

  • if changed the job updates the file

  • if more file exits the job continues with 2 -

  • otherwise ends.

Can anybody give me a little help how to start (its my first job I write...) ?

posted Jul 30, 2017 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

According to the book "Hadoop; The Definitive Guide", it is possible to use "-D property=value" to
override any default or site property in the configuration.

I gave it shot and it is true. The property specified with "-D" is ignored.

Then I put the property in an xml file and use "-conf xml_name" on the command line. But still I cannot
override the property.

The only way to override the default property is to get a Configuration reference in the code and set the property via the reference. But that is not convenient as I need to recompile the code each time I change the property.

Now the question is what is the right way to customize the configuration for a job?

...