Google makes progress with big data search tool
After six months in preview, Google has finally released BigQuery, a big data tool that enables users to query and process large datasets on Google’s cloud.
Unlike previous Google big data innovations such as the MapReduce framework, Google has commercial aspirations for BigQuery, adding value to its Google Apps and Google Analytics offerings. BigQuery provides a much easier to use, higher performance alternative to clients of “raw” infrastructure-as-a-service (IaaS) offerings such as Amazon’s Elastic MapReduce.
But there are several caveats. Like any cloud service for big data, it is questionable whether customers will be willing to deploy their data in public clouds, and while Google is renowned for web innovation, the visualization capabilities of BigQuery appear to be surprisingly arcane. The main benefit will be its ability to analyze and enhance the effectiveness of web interactions, making it highly complementary to Google Analytics. For the SQL developer, the tool will also function as a way into big data querying and analytics, a field that is waiting for talent to join.
BigQuery is basically an IaaS solution that enables users to upload, query, and analyze large datasets in the Google cloud. Like any cloud-based solution, it allows users to escape the hassle and expense of deploying and maintaining a large on-premise processing infrastructure. Through SQL-like queries, data can be queried through a web-browser, command line, REST API, and Google Apps scripts.
Advanced SQL without the SQL
Google categorizes BigQuery as an “OLAP system”. In contrast to the company’s OLTP-focused Google Cloud SQL service, BigQuery does not offer full-SQL syntax and table management tools such as table indexes, updates, deletes, or other SQL data-management features. BigQuery provides an SQL-like front end, making it friendlier to the large base of enterprise SQL developers, and allows higher performance than is associated with Hadoop. BigQuery is Google’s take on an Advanced SQL platform but without the SQL per se.
Initially, BigQuery is suited for querying and analyzing e-commerce data residing on Google’s platforms, such as user clicks, popularity of pages, Google AdWords success, and visitor tracking. It shines when users want to use it with e-commerce analytics tools that Google has already developed to perform statistical modeling for ad targeting and optimization. More viable use-cases could emerge as more organizations and developers experiment with BigQuery, and it is particularly important for Google to come up with use-cases internally. The solution needs to evolve if Google wants it to be a big data tool used with enterprise data residing outside Google’s platform.
Although BigQuery has a useful visualization feature, it is a far cry from the capabilities many other visualization tools offer for big data. Data exploration (letting a sea of data guide you to the problem or solution) is a key part of big data. These types of visualization tools allow the user to explore data by drilling up, down, and around data through a couple of clicks on a graphical object as a bar graph, something the current tool from Google does not offer. Instead, it is surprisingly static.
On the other hand, the drawbacks of Google’s own visualization tool might open the door to third-party suppliers to offer better and more dynamic data visualization applications sitting on top of BigQuery. Cloud BI provider Bime is already partnering with Google to do so, and other providers might do the same as the solution evolves. Google supports data access through a REST API, allowing anyone with knowledge of Java, Python, or any other popular programming language develop a third-party BigQuery solution.
Advantages of the cloud
The biggest selling point for BigQuery is that it provides the usual advantages that cloud-based solutions offer. Google will let users query up to the first 100-GB of data without charge. If their querying takes them beyond that amount, payment is required. For users who need to store and query data above 100-GB and up to 2-TB Google will charge $12 cents (€0.09) per GB per month for storage and $3.5 cents per GB processed with a limit of 1,000 queries per day. This makes the solution affordable for individuals and smaller organizations that have datasets in the low terabytes.
The tool can also function as a way for SQL developers to get their feet wet with querying large column store databases with SQL-like scripting. To this end, Google allows users to query the first 100-GB for free, and the tool can work as a training ground for developers new to the Big data field. Google’s goal is surely to build a large enough user base that will eventually query more than 100-GB and therefore make BigQuery a money-making product.
Fredrik Tunvall is an analyst in Ovum's Information Management Software group. For more information go to www.ovum.com/