Hadoop: WholeFileInputFormat takes the entire input file as input or each record(input split) as whole?

2 Answers

It takes the entire file as input otherwise it wont be any different from the normal line/record-based input format.

answer Jun 28, 2014 by Bob Wise

It takes entire file as input. There is a method in the class isSplittable in this input format class which is set to false. This method determines if file can be split in multiple chunks.

answer Jun 28, 2014 by Meenal Mishra

Similar Questions

+2 votes

Hadoop doesn't find the input file

I am trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I have successfully added the input and output directory on to HDFS. But when I run

$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5

I am getting something like:

INFO input.FileInputFormat: Total input paths to process : 0

Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception.

Can someone help me out?

+4 votes

Add few record(s) to a Hive table or a HDFS file on a daily basis

My requirement is a typical Datawarehouse and ETL requirement. I need to accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table or file is not a big table ( approximately 10 records per day). I don't want to Partition the table / file.

In few articles It was being mentioned that we need to load to a staging table in Hive. And then insert like the below :

insert overwrite table finaltable select * from staging;

I am not getting this logic. How should I populate the staging table daily.

+1 vote

How to set mapreduce.input.fileinputformat.split.maxsize for a specific job ?

In xmls configuration file of Hadoop-2.x, "mapreduce.input.fileinputformat.split.minsize" is given which can be set but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file. I need to set it in my mapreduce code.

+3 votes

Is there a way to run Mapreduce with mongodb as input and output to HDFS?

0 votes

The archive file created in Hadoop always has the extension of

Hadoop: WholeFileInputFormat takes the entire input file as input or each record(input split) as whole?

Your comment on this post:

2 Answers

Your comment on this answer:

Your comment on this answer:

Your answer

Preview