top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

How to install Snappy for Hadoop

+1 vote
1,881 views

I am running on Hadoop 1.0.4 and I would like to use Snappy for map output compression. I am adding the configurations:

configuration.setBoolean("mapred.compress.map.output", true);
 configuration.set("mapred.map.output.compression.codec", "org.apache.hadoop.io.compress.SnappyCodec");

And Ive added libsnappy.so.1 to $HADOOP_HOME/lib/native/Linux-amd64-64/
Still, all map tasks fail with "native snappy library not available". Could anyone elaborate on how to install Snappy for Hadoop ?

posted Jan 1, 2014 by Deepak Dasgupta

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

+2 votes

Did you build it for your platform? You can do an "ldd" on the .so file to check if the dependent libs are present. Also make sure you placed it in the right directory for your platform (Linux-amd64-64 or Linux-i386-32)

answer Jan 1, 2014 by Ahmed Patel
I did everything mentioned in the link Ted mentioned, and the test actually works, but using Snappy for MapReduce map output compression still fails with "native snappy library not available".
Your natives should be in LD_LIBRARY_PATH or java.library.path for hadoop to pick them up. You can try adding export HADOOP_OPTS=$HADOOP_OPTS -Djava.library.path= to hadoop-env.sh in TTs and clients/gateways and restart TTs and give it another try. The reason its working for Hbase is you are manually pointing HBASE_LIBRARY_PATH to the natives.

My guess is they are in a wrong location.
Similar Questions
+8 votes

I'm trying to enable the Hadoop native library and the snappy library for compression in Hadoop 2.2.0, but I always end up with:

./hadoop/bin/hadoop checknative -a
Native library checking:
hadoop: false
zlib: false
snappy: false
lz4: false
bzip2: false

I compiled hadoop-2.2.0-src from scratch for x64 and put the resulting .so in hadoop/lib/native/. I also compiled snappy from scratch and put it there. In a different approach I installed snappy via sudo apt-get
and then linked the resulting .so to hadoop/lib/native/libsnappy.so, still no luck.

What is going on here? Why won't Hadoop find my native libraries? Is there any log where I can check what went wrong during loading?

+1 vote

I am trying to use ambari (hortonwoks) to install hadoop. One step is to pre-config DNS as manual link below. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_using_Ambari_book/content/ambari-chap1-5-5.html

Since I am only using internal network, not sure how to config fully.qualified.domain.name. My hostname -f only shows localhost. And in /etc/sysconfig/network, it give HOSTNAME=localhost.localdomain

If anyone already get hadoop running, what is the real DNS requests for hadoop. Any suggestion, thanks a lot.

+2 votes

I want to use hive in hadoop2.2.0, so I execute following steps:

$ tar ¨Cxzf hive-0.11.0.tar.gz 
$ export HIVE_HOME=/home/software/hive 
$ export PATH=${HIVE_HOME}/bin:${PATH} 
$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir /user/hive/warehouse 
$ hadoop fs -chmod g+w /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse 
$ hive

Error creating temp dir in hadoop.tmp.dir file:/home/software/temp due to Permission denied

How to make hive install success?

0 votes

I want to ask, what's the best way implementing a Job which is importing files into the HDFS?

I have an external System offering data accessible through a Rest API. My goal is to have a job running in Hadoop which is periodical (maybe started by chron?) looking into the Rest API if new data is available.

It would be nice if also this job could run on multiple data nodes. But in difference to all the MapReduce examples I found, is my job looking for new Data or changed data from an external interface and compares the data with existing one.

This is a conceptual example of the job:

  • The job ask the Rest API if there are new files
  • if so, the job imports the first file in the list
  • look if the file already exits

  • if not, the job imports the file

  • if yes, the job compares the data with the data already stored

  • if changed the job updates the file

  • if more file exits the job continues with 2 -

  • otherwise ends.

Can anybody give me a little help how to start (its my first job I write...) ?

...