I am trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I have successfully added the input and output directory on to HDFS. But when I run
$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
I am getting something like:
INFO input.FileInputFormat: Total input paths to process : 0
Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception.
Can someone help me out?