top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

What are the different types of tombstone markers in HBase for deletion?

+2 votes
383 views
What are the different types of tombstone markers in HBase for deletion?
posted Jul 26, 2017 by Karthick.c

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+2 votes

How to migrate hbase table from hbase-0.94 to hbase-0.98, which both belong to different hadoop clusters.

I had exported data from old cluster to the new cluster using the hadoop distcp command,as follows

hadoop distcp -update pb -skipcrccheck htfp://192.168.200.21:50070/user/root/ParsedData /user/root/

and executed the hbase import command to import data to hbase-0.98.

hbase -Dhbase.import.version=0.98.6 org.apache.hadoop.hbase.mapreduce.Import ParsedData /user/root/ParsedData

This command executed successfully,but the 'ParsedData' table is always empty. any suggestions?

+1 vote

We just installed HDP 2.2 through Ambari. We were under the impression that in HDP 2.2., the default deployment mechanism for HBase/Accumulo is through Slider (i.e., they are enabled by default for YARN). However, that does not seem to be the case. Can we choose to install HBase through Slider during HDP installation through Ambari? i.e., was there a customization option that we are missing

If Slider is not the default mechanism for HBase on HDP 2.2, why not?

+1 vote

I have a roughly 5 GB file where each row is a key, value pair. I would like to use this as a "hashmap" against another large set of file. From searching around, one way to do it would be to turn it into a dbm like DBD and put it into a distributed cache. Another is by joining the data. A third one is putting it into HBase and use it for
lookup.

I'm more familiar with the first approach, so it seems simpler to me. However, I have read that using a distributed cache for files beyond a few megabytes is not recommended because the file is replicated across
all the data nodes. This doesn't seem that bad to me because I just pay this overhead once at the beginning of the job, and then each node gets a copy locally, right? If I were to go with join, would it not increase the workload (more entries) and create the same network congestion issue? And wouldn't going with HBase means making it a bottleneck?

What's the advantage and disadvantage of going for one solution over the others? What if, for example, that "hashmap" needs to be from, say, a 40GB file. How would my option change? At which point would
each option make sense?

...