A really short write-up on HBase and its features. For a longer version, check the open source tutorial.
Like BigTable and many relational databases, HBase also follows a three-level hierarchy
HBase Data Hierarchy
|Region Name||Relational Term||Functionality|
|ROOT||Catalog||Covers an entire installation|
|.meta||Database||Covers an entire database|
|.table||Tables / Partitions||Covers a partition of the table|
- Each directory is also stored in the same format as the regular tables.
- The common data structure is called a REGION
- This makes it easy to store and access the meta-data the same way.
- Each region is assigned to a unique Region Server, that handles both the scan and the update loads on the server.
- Of course, secondary Region Servers will take over when the Primary Region Server goes down.
- This approach ensures there is no data contention across machines
Finding the Region
To find the Region Server at which a particular key may be present, the following are the steps
- From Zookeeper, find the Server that contains the Root region
- From the Root Server, find out the Server that contains the Meta region
- From the Meta Server, find the Server that contains the Key Region
- From the Key Server, retrieve the value of the key
Note that the Root Region Server, the Meta Region Server and Key Region Server may be cached at the client. So, it may directly contact the Key Region Server directly in many cases.
The two points below help search effectively within a Key Region
- The records in the file are kept sorted, and hence can provide both sub-range queries and random queries efficiently.
- The Indexes are flat, and only on the key. This means at-most two disk blocks are accessed, before the data is read.
Finding the Value
Once a Region is located, the following steps help search for the value
- The Store-File contains the Index at the end of the file, which contains information on exactly which block stores the data.
- Within a block, binary search can be used to verify if the data exists or not.
Immediate Full Consistency
- The database follows “Immediate Consistency” on all of its writes.
- It also follows full ACID semantics
- This means after a Write is completed, any subsequent Read will read the latest value.
- For each region, there are Transaction Logs maintained on the underlying file system.
- This ensures the recovery in case the rest of the Write process fails.
- The database is designed, not as a B-Tree, but as a “Log-Store Managed Tree”
- A “Log Store Managed Tree” works as follows
Log Store Managed Technique
- Write the transaction log, so that the “D” portion of ACID is satisfied
- Next change the memory structures. At this point, the transaction is considered done.
- Periodically, we run a Merge pass, that
- Do an entire sequential scan of the entire sorted Region from the disk,
- Merge the memory changes
- Write back a new version of the Region to disk
- Update the Region server to use this new Region server, instead of the old one.
Why this technique works better
The scenario we have is one of “High Workloads” and a “Large dataset”. Under this situation, a sequential scan performs better than B-Tree writes
- B-Trees modify different sparse blocks, and the Disk seek time is wasted. At higher workloads, the disk arm may not be able to keep up with the pace of “Disk requests”.
- Even if we batch the Writes, because the “Data set” is vast, the disk blocks written are still sparse.
Since the Cell values are sparse, the cells are stored as tuples
|<row-key> <col-key> <value>|
Enhancing this to include column names and timestamps, the representation is
The key-value is therefore represented as
Efficient Representation of Data
As a result of the above representation, the keys can get very long
- One needs to be conscious of the size of the key, and keep it short
- If there is only one column, part of the data may be used as part of the key.
HBase requires a good key design, if the load needs to be distributed across multiple machines
- If the keys are timestamps or sequences, then, they get assigned to the same region, and this becomes a hot-spot
- To avoid this, the keys may be hashed.
- Alternately, the leading dimension should not be timestamped.