top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Hive LOAD DATA INPATH imports all records twice?

+2 votes
317 views

I am trying to load JSON data into Hive using hcatalog JsonSerDe. I have created the table, but when I use LOAD DATA INPATH command to load 8 records into the table. However, SELECT * shows 16 records in the table, each record duplicated. Why is this happening?

posted May 8, 2015 by Ramakrishnan

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+2 votes

I'm a freshman in hadoop world. After some struggling, i've successfully make hadoop 2.6 running on my windows 7 laptop.

However when I want to run hive 1.0.0 on my win 7 system, I found there is no cmd line script as provided for linux. It's also hard to find any useful message in google.

Anyone can provide me any clue on how to run hive on window 7?

+1 vote

I am using hive queries on structured RC file. Can someone please let me know the key performance parameters that I have to tune for better query performance (HADOOP 2.3/ YARN AND HIVE 0.13).

+2 votes

I want to use hive in hadoop2.2.0, so I execute following steps:

$ tar ¨Cxzf hive-0.11.0.tar.gz 
$ export HIVE_HOME=/home/software/hive 
$ export PATH=${HIVE_HOME}/bin:${PATH} 
$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir /user/hive/warehouse 
$ hadoop fs -chmod g+w /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse 
$ hive

Error creating temp dir in hadoop.tmp.dir file:/home/software/temp due to Permission denied

How to make hive install success?

+2 votes

I submit a MR job through hive ,but when it run stage-2 , it failed but why? It seems permission problem , but I do not know which dir cause the problem

Application application_1388730279827_0035 failed 1 times due to AM Container for appattempt_1388730279827_0035_000001 exited with exitCode: -1000 due to: EPERM: 
Operation not permitted at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581) at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:388) at 
org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1041) at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150) at 
org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:190) at 
org.apache.hadoop.fs.FileContext$4.next(FileContext.java:698) at 
org.apache.hadoop.fs.FileContext$4.next(FileContext.java:695) at 
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at 
org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:695) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs(ContainerLocalizer.java:385) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:130) at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:861) .
Failing this attempt.. 
Failing the application.
+1 vote

I've been reading a lot of posts about needing to set a high ulimit for file descriptors in Hadoop and I think it's probably the cause of a lot of the errors I've been having when trying to run queries on larger data sets in Hive. However, I'm really confused about how and where to set the limit, so I have a number of questions:

  • How high is it recommended to set the ulimit?
  • What is the difference between soft and hard limits? Which one needs to be set to the value from question 1?
  • For which user(s) do I set the ulimit? If I am running the Hive query with my login, do I set my own ulimit to the high value?
  • Do I need to set this limit for these users on all the machines in the cluster? (we have one master node and 6 slave nodes)
  • Do I need to restart anything after configuring the ulimit?
...