My project required to execute a hadoop job remotely and the job requires some third-part libraries (jar files). I tried:
1. Copy these jar files to hdfs.
2. Copy them into the distributed cache using DistributedCache.addFileToClassPath so that hadoop can spread these jar files to each of the slave nodes.
However, my program still throws ClassNotFoundException. Indicating that some of the classes cannot be found when the job is running.
So I am lookinh:
1. What is the correct way to run a job remotely and programmatically while the job requires some third-party jar files.
2. I found DistributedCache is deprecated (Im using hadoop 1.2.0), what is the alternative class?