Friday, November 05, 2010

What Happened to Secondary Indexes for Azure Tables?

• Update 11/5/2010: Ryan Dunn (@dunnry) heard the message, as he mentioned in Cloud Cover Episode 31 - Startup Tasks, Elevated Privileges, and Classic ASP at 00:04:15. Thanks to @BrentCodeMonkey for the heads up (see end of post).

Windows Azure Tables have a single, composite index on a PartitionKey + RowKey unique identifier (equivalent to a relational primary key), so all table data retrieved is sorted in this (ascending) order. One of the more popular requests since the first CTP of Windows Azure is the ability to specify additional (secondary) indexes, which would enable high-performance filtering by column values, instead of laborious row-scans with <, <=, ==, >=, or > operators.

The Windows Azure Storage Team and its predecessors have promised secondary indexes for Azure Tables since the initial Azure Beta program in 2008 but haven’t delivered in the intervening two years.

As far as I’ve been able to determine, secondary indexes for Azure Tables weren’t even mentioned by the Windows Azure team at PDC 2010. I heard no mention of secondary indexes in Jai HaridasWindows Azure Storage Deep Dive session on Friday morning and his slide deck was similarly bereft of secondary index mentions.

Here are Jai’s performance-related slides for Azure Tables Tips & Best Practices:




Mike Amundsen (@mamund) lamented the absence of secondary indexes in his Implementing a Simple Word Search Using Azure Table Storage article of 3/10/2009, which is subtititled Exploring Schema-less Storage Patterns and describes tactics for word-searching tables of the Entity-Attribute-Value model rather than relational tables with SQL’s LIKE operator. Mike explains:

image Implementing a word search in Azure Table Storage is not quite as easy. First we don't have an equivalent to the LIKE keyword at our disposal. Second, we don't have the ability to create secondary indexes on a table (or Entity collection in EAV-speak). That means we need to do the work ourselves. Essentially, we need to build a table to hold the links between search words and the actual Message records. Doing this will allow us to mimic the LIKE keyword from relation query languages. [Emphasis added.]

I (@rogerjenn) commented in the “Azure Blob, Table and Queue Services” section of Windows Azure and Cloud Computing Posts for 3/9/2009+:

imageThe technique Mike explains is similar to that used by MapReduce applications over EAV data models, such as Google’s BigTable or Yahoo!’s Hadoop. Here’s a link to a recent Hadoop/MapReduce presentation by Don Brown of Twitpay.

Manuvir Das presented Windows Azure Present and Future at PDC 2009. Here’s one of his slides at 00:42:00 in the session:


Manuvir accompanied the preceding slide with the following narration:

“And finally on the tables, we’ve had a lot of feedback regarding secondary indices so that you can query a table many different ways, and that’s a facility that’s coming, too.

Mike Wickstrand posted PDC ’09 & Your Great Windows Azure Ideas! on 11/23/2009, the first part of which appears below:

image It was an amazing week at PDC ’09, chatting with a lot of customers, meeting old friends, hearing all of the cool stuff other Microsoft teams are delivering, and getting time to celebrate with the hard-working Windows Azure Team! Given my role on the Windows Azure Team, the highlight for me was digging in with customers on their reactions to our PDC ’09 announcements and also hearing about what customers want to see in Windows Azure in the future.

At PDC last week we announced that in the future we’ll support scenarios that ISV & enterprise customers in particular will appreciate. This includes solutions that will make it easier for customers to migrate existing applications and give customers greater control over cloud VMs. Specifically, we announced we will: give customers administrator privileges on cloud VMs, deliver facilities for user-driven construction and configuration and platform deployment of VM images, and provide remote terminal server access to cloud VMs. Additionally, it was announced that we will offer customer selectable geo-locations for replicas and secondary indices for Windows Azure Tables. If you want more information, a bit more is available here. … [Emphasis added.]


Mike referenced the preceding post in his response to a request to Support Secondary Indexes by Andy Britcliffe (@andybritcliffe) in the Windows Azure Feature Voting forum:

Amazon SimpleDB and Google App Engine’s database offer secondary indexes. It’s about time that Windows Azure Tables did the same.

• Update 11/5/2010: Ryan Dunn heard the message, as he mentioned in Cloud Cover Episode 31 - Startup Tasks, Elevated Privileges, and Classic ASP at 00:04:15:


Quoting Ryan at 00:04:15, “At the top, almost everything in the top 80% of the requested features was checkboxed. As soon as we get secondary indices (Roger Jennings), then we’ll have the whole thing.


manny said...

2012 already and still no news on secondary indexes. At least pointers in terms of when and how would be great...Would suck if we have to make a lot of changes to code or do premature optimizations (considering secondary indexes makes it soon)...