In this post, I would like to share my impressions and experience prototyping SparklineData/spark-druid-olap open source framework. The main idea of the framework to enable SQL access to Druid index using Tableau Desktop , on the way provide single access point API to query indexed and raw data. Since Druid 0.9 does not have SQL support out of the box, the big advantage of SparklineData framework is providing the ability to run SQL queries over Data in Druid, which is very useful for the end to end Tableau integration. Another aspect of using the same API to query raw data is not useful in practice, at least from Tableau perspective. Running environment information Hadoop cluster of Cloudera distribution 5.10.0 Spark 1.6.1 SparklineData release 0.4.1 Druid 0.10.0-rc SparklineData running as part of the Spark Thrift server which is unfortunately not supported by default Cloudera distribution and requires to recompile Spark with Thrift support . A...
Nowadays geospatial query of the data became a part of almost every application working with location coordinates. There are a number of NoSQL databases supports geospatial search out of the box such as MongoDB, Couchbase, Solr etc. I'm going to write about bounding box query over HBase . I'm using this kind of query to select points located on visible geographic area of the WEB client. During my investigation, I realized that more effective and more complex solution is using Geohash or QuadKeys approach. These approaches required to redesign you data model then I found more simple (and less effective) solution - using built-in HBase encoder OrderedBytes (hbase-common-0.98.4-hadoop2.jar) Current example of code working with HBase version 0.98.4. Actually bounding box query required comparison of latitude and longitude coordinates of the point saved in HBase. As you know HBase keep data in a binary format according to lexicographical order. Because of coordinates ...