Geo-spatial search in HBase

Nowadays geospatial query of the data became a part of almost every application working with location coordinates.
There are a number of NoSQL databases supports geospatial search out of the box such as MongoDB, Couchbase, Solr etc.

I'm going to write about bounding box query over HBase. I'm using this kind of query to select points located on visible geographic area of the WEB client.

During my investigation, I realized that more effective and more complex solution is using Geohash or QuadKeys approach. These approaches required to redesign you data model then I found more simple (and less effective) solution - using built-in HBase encoder OrderedBytes (hbase-common-0.98.4-hadoop2.jar)

Current example of code working with HBase version 0.98.4.

Actually bounding box query required comparison of latitude and longitude coordinates of the point saved in HBase.
As you know HBase keep data in a binary format according to lexicographical order.
Because of coordinates have a Double type and may be negative numbers, hexadecimal binary representation of negative latitude or longitude bigger than smallest positive value.

For example hexadecimal binary representation of

0.0 is \x00\x00\x00\x00\x00\x00\x00\x00 and
-1.0 is \xBF\xF0\x00\x00\x00\x00\x00\x00, the last one is bigger.

This problem prevents saving coordinates as Double types and use standard BinaryComparator of HBase inside SingleColumnValueFilter.

To solve this problem I choose to use OrderedBytes encoder which guarantees lexicographical order presentation of Double type.

Using OrderedBytes hexadecimal binary representation of

0.0 is \x00\x00\x15\x00\x00 and
-1.0 is \x00\x00\x12\xFD\x00, the last one is smaller as expected!

This solution required to encode/decode coordinates before/after saving/retrieving data from HBase, hence I implemented EncodingUtil using TestOrderedBytes as sample code.

public class EncodingUtil {

public static byte[] encodeDouble(double valueToEncode, int length){
byte[] internal = new byte[length + 3];
PositionedByteRange byteRange = new SimplePositionedByteRange(internal, 1, length+1);
byteRange.setPosition(1);
OrderedBytes.encodeNumeric(byteRange, valueToEncode, Order.ASCENDING);
return byteRange.getBytes();
}

public static double decodeDouble(byte [] valueToDecode, int length){
PositionedByteRange byteRange = new SimplePositionedByteRange(valueToDecode, 1, length+1);
byteRange.setPosition(1);
return OrderedBytes.decodeNumericAsDouble(byteRange);
}
}

There is a EncodingUtil class code here

I have fixed size of latitude and longitude, then filters combination looks like:

allFilters.add(new SingleColumnValueFilter(cf, Bytes.toBytes(latColumnQualifier), CompareOp.GREATER, new BinaryComparator( EncodingUtil.encodeDouble(lowerLeftLat, 9))));

allFilters.add(new SingleColumnValueFilter(cf, Bytes.toBytes(latColumnQualifier), CompareOp.LESS, new BinaryComparator(EncodingUtil.encodeDouble(upperRightLat, 9))));

allFilters.add(new SingleColumnValueFilter(cf, Bytes.toBytes(lonColumnQualifier), CompareOp.GREATER, new BinaryComparator( EncodingUtil.encodeDouble(lowerLeftLong, 9))));

allFilters.add(new SingleColumnValueFilter(cf, Bytes.toBytes(lonColumnQualifier), CompareOp.LESS, new BinaryComparator(EncodingUtil.encodeDouble(upperRightLong, 9))));

I'm using this Scan in Provisioning module and I have limited number of records saved in HBase, so I can live with low response latency of the Scan nevertheless doesn't require to change my data model.

EDIT: Current method will provide inaccurate results in case of bounding box big size because it does not take into consideration that surface of the Earth is not flat but curve, so bounding box will never has a form of straight rectangle.

David's technical blog

Search This Blog

Geo-spatial search in HBase

Labels

Comments

Post a Comment

Popular posts from this blog

Oozie workflow basic example

Distributed bloom filter