Tuesday, April 08, 2008

Rumor: Google to Compete with SimpleDB and SSDS with Bigtable as a Web Service

Update 4/8/2008: It's no longer a rumor.

Google announced the new Google App Engine service last night at their second invitation-only, outdoor Campfire One event. The rumor that Bigtable would become accessible to developers as a highly-scalable data store in the cloud was confirmed and a demarcation line established between free and paid services. What surprised most observers was support for custom Python Web applications as the front end to the Bigtable implementation on the Google File System (GFS).

Here's how the first post on the Google App Engine blog describes the development environment's features:

  • Dynamic webserving, with full support of common web technologies
  • Persistent storage (powered by Bigtable and GFS with queries, sorting, and transactions)
  • Automatic scaling and load balancing
  • Google APIs for authenticating users and sending email
  • Fully featured local development environment

By the time I signed up, the initial 10,000 invitations were taken, but the App Engine SDK includes a Cassini-like Web server for local development and lets you define data stores locally. I've downloaded the SDK and have it running the Greetings app (HelloWorld with Python 2.5.2 under Windows Vista Ultimate.

I'm exploring the persistent storage features of the SDK in the docs now and will be comparing them to Amazon SimpleDB and SSDS in a forthcoming post. In the meantime, it's interesting to note that Google's data-access wrapper to Bigtable features transactions for entity updates, a rich collection of data types for properties (with Google Base overtones), an SQL-like query language (GQL), plus emulations of many:one associations with the Reference data type and one:many associations with the List data type.

There's no question that the Google App Engine will compete with SSDS. The question is: Will Google enhance their Datastore API with a Web-facing front-end that doesn't require custom Python code to provision domains and perform CRUD operations?

Mary Jo Foley's Google App Engine: When will Microsoft field a competitor? post of April 8, 2008 discusses the Google App Engine with an SSDS bent.

Later Updates

Update 4/10/2008: App Engine World, which claims to be "Your one-stop resource for Google App Engine has launched with the following announcement: "Coming soon! A complete community for Google App Engine Developers." Until then, the site offers a few links through DZone and a few other A-list sites. Amazingly, the site doesn't have an Atom or RSS feed.

The Official Google Data API Blog's Release the hounds: Support for App Engine and Contacts API post of April 8, 2009 announces:

The 1.0.12 release of the gdata-python-client has a couple of nifty new features which I thought are worth mentioning. First, is a new module which allows the Python library to be used on Google App Engine. Since the App Engine runs a sandboxed Python interpreter, HTTP calls must be made using one of the App Engine's APIs: urlfetch. After some refactoring of the library's code, I was able to write a drop-in replacement for making HTTP requests, so that the gdata.service module can be used on Google App Engine.

It looks to me like this will enable direct HTTP request/response operations against GAE's Datastore API. Confirmation by a GData expert would be appreciated.

Update 4/9/2008: Henry Work posted TechCrunch Labs: Our Experience Building And Launching An App On Google App Engine on 4/8/2008; it describes creating and uploading in four hours a simple Python application that accesses the Data Store API. You can try the app Henry built at appengine.crunchbase.com. The post also has a screen capture of GAE's dashboard.

Redmonk analyst Michael Coté's Your Data in the Cloud - URL-based computing, SimpleDB, Astoria, etc. post of December 23, 2007 echoed my observation in Amazon Announces Beta of SimpleDB Web Services in the Cloud (December 14, 2008) that the hosted version of ADO.NET Data Services (Astoria) represented an interesting model for a future Microsoft service. Michael also notes that:

[T]hese new technologies need to be completely self-service. If a developer has to ever talk with a human from the company or team offering the project, something has gone wrong. You can sling out all sorts of “complex problems need complex solutions” chaff, but the historic fact remains that the new, simple solution tends to win out versus the new, “complete but complex” solution. And of course, there’s some wiggle room with the difference between “complex solution” and “easy to use complex solution.”

The need to write complex Python apps to provision and conduct CRUD operations on "data in the cloud" will be a major issue with GAE.

Henry Blodgett chimes in with Google, Amazon Lead Disruptive Cloud Computing Wave, Microsoft Again Behind Curve of April 9, 2008. He said that the Google App Engine was "not disruptive technology" in his Google Taking On Amazon With App Engine, because:

Google isn't aiming at the heart of of a competitor's business. Office is core to what Microsoft (MSFT) does, but Amazon's services are sidelights designed to take advantage of the company's prodigious resources -- when they work.

Update 4/7/2008: According to TechCrunch's Michael Arrington in his Major Google Announcement Monday Evening: Is It BigTable? post of 4/6/2008:

Google is hosting the second of their developer events, called “Campfire One,” on Monday (April 7) evening. Multiple people have forwarded email invitations to me for the event, where Google promises they’ll be “unveiling another exciting technology” to the developer community. ...

My guess is that Google will be announcing the launch of web services that will compete head on with those offered by Amazon and others. The anchor for these services, we hear, is their internal database system called BigTable. Google has definitely briefed press on the imminent launch of BigTable as a web service, although as we said last week we haven’t been contacted.

Original 4/5/2007 story starts here:

TechCrunch blogger Mark Hendrickson relies on "a source with knowledge of the launch" to report the following rumors in Source: Google To Launch BigTable As Web Service of April 4, 2008:

Google may be releasing BigTable, its internal database system, as a web service to compete with Amazon SimpleDB, according to a source with knowledge of the launch. There are also rumors that press is being pre-briefed on the product, although we haven’t been contacted by Google. ...

The decision to open up BigTable would seem to mark Google’s challenge to Amazon Web Services (AWS) suite, which also includes the Elastic Compute Cloud (EC2) for cloud processing power and Simple Storage Service (S3) for cloud storage.

If Google does indeed announce public access to BigTable next week, expect the company to follow up with cloud storage and processing solutions as well, since there are substantial synergies between the three.

Bigtable: A Distributed Storage System for Structured Data is a paper by nine Google Employees that describes Bigtable in detail. (The paper was one of two Best Papers at the 7th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2006.)

An interesting fact from the paper's Table 2: Google Base runs on Bigtable and, when the paper was prepared in mid-2006, Google Base had 10 terabytes of data in 10 billion cells. Google Base, which is still in beta, became publicly available in November 2005 but is not as widely used as other free Google apps.

My conclusion based on the Bigtable paper and SimpleDB documentation: Bigtable's architecture and implementation have more in common with Amazon's Simple DB than either database has with SSDS:

  • Bigtable and SimpleDB aren't relational database management systems (RDBMSs); both resemble multidimensional indexed maps of attribute/value pairs. SSDS attempts to hide the fact that it's an RDBMS.
  • Bigtable and SimpleDB have latency, which results in eventual consistency, although it appears from Table 2 of the Bigtable paper that no-latency operation is an option; SSDS has no latency by design.
  • Bigtable values are an "uninterpreted array of bytes" and SimpleDB stores only strings; SSDS has string, number, datetime, binary and boolean datatypes.

Update 4/6/2008: This post became a Techmeme discussion item on April 5, 2008 and resulted in the blog's highest Saturday traffic in several months.

Dare Obasanjo's Amazon SimpleDB: The Good, the Bad and the Ugly post of December 21, 2007 provides a more detailed comparison of the structure of SimpleDB and Bigtable.

Dan Farber adds his 2 cents in Is Google bringing Bigtable out of the closet? of April 4/ 2008.

Dave Winer Reads the Tea Leaves

Charging for Web services would be a new business model for Google, but Dave Winer's prescient mock interview with a pig at a Walnut Creek stoplight on March 29, 2008 went (in part) like this:

Pig: You know how Amazon has all those great web services.

Dave: Yes, ... I use them and they're great.

Pig: Well how would you like to get all those services and more, and get to run software in Google's cloud, just like all the people at Google do?

Dave: Yes, I would. ... How much would it cost?" I asked

Pig: That's the best part. ... For a guy like you, a blogger, with modest needs, it would be free.

The obvious implication is that the service wouldn't be free to all users.

Dave continued the next day in Why would Google Web Services cost $0?:

Google Web Services, or GWS, is the hypothetical competitor to Amazon Web Services that I wrote about yesterday.

The first question that comes up is how can they afford to give it away? That came up in yesterday's comments and the answer is important enough to deserve its own blog post. ...

Flipped around, I don't see why Amazon charges me to use AWS. I think I produce as much value for them as I use just by writing about it, but they haven't been willing to bend (not that I've asked them to). If there was no cost to it, I'd use their services for new things that I'm not willing to try as long as I have to pay. I know that because there are projects I've not attempted because the cost was prohibitive. ...

Dave went on to surmise that Google gives away its services to increase the value of acquisitions that already use its technology and concluded:

What's hard to believe is how much of a running start Microsoft, Yahoo, and Google have been willing to let Amazon have.

Google might start with a free service like Google Base, but I'd bet that they would establish a tiered tariff with only heavy hitters paying a SimpleDB-like fee. The approach would be similar to Google Enterprise's Google Apps Premier Edition, which costs $50/year for an account. Google claims to have 100,000 Google Apps customers.

Note: Dave said in a 4/4/2008 comment to the TechCrunch post:

Mark, pretty sure the price will be $0, with some ceiling that keeps other infrastructure vendors from reselling Google’s service.

An exciting development for sure!

Update 4/6/2008: GigaOm's Kevin Kelleher comments on Dave's post in How Google Can Eat Amazon’s Lunch of March 31, 2008. Dave responds in a comment, Matthew Ingram posts Google: Why not make the cloud free?, and links to James Hamilton's Third Party Service Platform From Google? post of March 30, 2008. (James is an architect on the Windows Live Platform team. His Microsoft Research Bio says "Before Windows Live, James was General Manager of the Microsoft Exchange Hosted Services team which was formed as part of the FrontBridge Technologies acquisition.")

Al the Folknologist analyzes Dave's post in Is there a silver lining in Google's Cloud? of April 1, 2008 with the assumption that Google would offer an EC2 clone with a Cloud Virtual Machine, which could "run any suitable Java byte code that has been built with the CVM toolkit." His New value, old money, familiar issue? post of the next day suggests that cloud-based business data storage is analogous to storing money in banks.

On the Other Hand ...

Kevin Burton said on December 14, 2007 (the day that Amazon announced SimpleDB) in Google vs Amazon in Open Infrastructure:

Last time I checked, Amazon’s bandwidth pricing was insane. It would literally cost us 3x more to host Spinn3r. Granted, we process a LOT of data (from 60-160Mbits per month) but when your startup is successful you don’t want to burn it all your AdSense revenues on bandwidth invoices courtesy of Amazon.

It looks to me as if there's something wrong with Kevin's data volume figure (I process more Mbits than that per month) but it's doubtful if Google's terms of service would let folks use BigTable in direct competition. (Spinn3r is a pay-to-play blog search API.)