What are the different types of tombstone markers in HBase for deletion?

568 views

posted Jul 26, 2017 by Karthick.c

Looking for an answer? Promote on:

Similar Questions

+1 vote

What are the different operational commands in HBase at record level and table level?

+3 votes

Explain the difference between HBase and Hive?

+2 votes

How to migrate hbase table from hbase-0.94 to hbase-0.98 which both belong to different hadoop clusters

How to migrate hbase table from hbase-0.94 to hbase-0.98, which both belong to different hadoop clusters.

I had exported data from old cluster to the new cluster using the hadoop distcp command,as follows

hadoop distcp -update pb -skipcrccheck htfp://192.168.200.21:50070/user/root/ParsedData /user/root/

and executed the hbase import command to import data to hbase-0.98.

hbase -Dhbase.import.version=0.98.6 org.apache.hadoop.hbase.mapreduce.Import ParsedData /user/root/ParsedData

This command executed successfully,but the 'ParsedData' table is always empty. any suggestions?

+1 vote

Using Slider as a default mechanism for HBase on HDP 2.2

We just installed HDP 2.2 through Ambari. We were under the impression that in HDP 2.2., the default deployment mechanism for HBase/Accumulo is through Slider (i.e., they are enabled by default for YARN). However, that does not seem to be the case. Can we choose to install HBase through Slider during HDP installation through Ambari? i.e., was there a customization option that we are missing

If Slider is not the default mechanism for HBase on HDP 2.2, why not?

+1 vote

Advantage/disadvantage of dbm vs join vs HBase

I have a roughly 5 GB file where each row is a key, value pair. I would like to use this as a "hashmap" against another large set of file. From searching around, one way to do it would be to turn it into a dbm like DBD and put it into a distributed cache. Another is by joining the data. A third one is putting it into HBase and use it for
lookup.

I'm more familiar with the first approach, so it seems simpler to me. However, I have read that using a distributed cache for files beyond a few megabytes is not recommended because the file is replicated across
all the data nodes. This doesn't seem that bad to me because I just pay this overhead once at the beginning of the job, and then each node gets a copy locally, right? If I were to go with join, would it not increase the workload (more entries) and create the same network congestion issue? And wouldn't going with HBase means making it a bottleneck?

What's the advantage and disadvantage of going for one solution over the others? What if, for example, that "hashmap" needs to be from, say, a 40GB file. How would my option change? At which point would
each option make sense?

What are the different types of tombstone markers in HBase for deletion?

Your comment on this post:

Your answer

Preview