Distributed scans with HBase

HBase is by design a columnar store, that is optimized for random reads. You just ask for a row using rowId as an identifier and you get your data instantaneously. Performing a scan on part or whole table is a completely different thing. First of all, it is sequential. Meaning it is rather slow, because it doesn’t use all the RegionServers at the same time. It is implemented that way to realize the contract of Scan command - which has to return results sorted by key. So, how to do this efficiently? ...

December 10, 2013 · 3 min · Marcin Cylke

Simple HBase ORM

When dealing with data stored in HBase, you are quick to come to a conclusion, that it is extremaly inconvenient to reach to it via HBase native API. It is very verbose and you always need to convert between bytes and simple types - a pain. While I was working on a project of mine, I thought, why not to easy those pains and fetch real objects from HBase. And that’s how this simplistic, hackish ORM came to life. It is no match for projects like Kundera (a JPA compliant solution), or n-orm. Nevertheless, it just suits my needs :) ...

December 8, 2013 · 2 min · Marcin Cylke