Saturday, December 06, 2008

Attempts to Cure SimpleDB’s Scrambled Attribute Disease

My SimpleDB Drops Dead at ~1:45 PM PST on 12/4/2008 post of 12/4/2008 (updated 12/5/2008) described a problem I called SimpleDB’s Scrambled Attribute Disease (SAD). SAD’s symptom is an arbitrary but consistent order of attributes returned by invoking a consistent QueryWithAttributesRequest() method, followed by a dramatic shift in the sequence of attributes to a new arbitrary but consistent order. The symptom occurs in homogeneous SimpleDB domains to which all items have been added by an identical PutAttributesRequest() method.

Update 12/6/2008 7:00 AM PST: Clarification of issues and response from the SimpleDB tech support staff added.

An outbreak of the disease wreaks havoc on strongly typed generic List<T>s generated by iterating the Items collection returned by invoking the QueryWithAttributesResponse() method and supplying queryResponse.Item[i].Attribute[j].Value expressions as the constructor's arguments, as illustrated in the code examples below. These generic lists commonly serve as the data source for databound .NET Web and Windows controls, as shown in the screen captures later in this post.

My expectation of attribute order consistency was based on the C# library’s use of a numeric index j for the Attribute[j] axis. However, Stefano@AWS squelched my expectation with the following response to my Receiving InternalErrors from SimpleDB thread in the Amazon SimpleDB (Beta) forum:

Indeed we do not provide any guarantee about the order of attributes returned by QueryWithAttributes.

Unlike traditional database systems, SimpleDB provides a sparse rather a fixed schema. Because of that, different Put requests against the same domain can insert items with different sets of attributes.

And different Put requests can insert the same set of attributes but with a different order. For example, nothing prevents a user from issuing a Put where the first attribute is CustomerId and the second is CompanyName, and then a second Put where the first attribute is CompanyName and the second is CustomerId.

Because of that, the QueryWithAttributes API can't and won't provide any guarantees about returning attributes in the explicit order added by the Put operation(s) that inserted the data being returned.

To prevent other developers from being lead down the primrose path to SAD, the C# library documentation for the QueryWithAttributesResponse.Item property and QueryWithAttributesResponse.WithItem() method should carry the following warning label:

There is no guarantee as to the order of attributes returned by this method’s/property’s list.

Similarly, adding this caveat to the QueryWithAttributesRequest.AttributeName property would prevent developers from spinning their wheels attempting to order attributes with a list:

The order of attributes retrieved in the QueryWithAttributesResponse.Item property is not affected by the order of attributes in this list.

Providing an overload that substitutes Item[i].Attribute[Name] for the numeric Item[i].Attribute[j] index would be appreciated.

Note: Extensive testing over a period of more than six months indicates that the SAD malady doesn’t appear to be present in Microsoft’s SQL [Server] Data Services, whose entities have characteristics that are quite similar to SimpleDB items.

This post contains additional details about the problem and the initial method of curing it, in response to a request from Stefano@AWS in the same forum thread.

POST Header and Body to Add a Single Customer Item to the Customers Domain

Executing the C# library’s PutAttributesRequest() method for the first Northwind Customer item (entity) sends the following HTTP POST header and body:

The number (#) in the Attribute.#.Name and Attribute.#.Value POST query string specify explicitly the ordinal sequence of the attributes for the item.

POST Header and Body for a QueryWithAttributes Request

However, invoking the QueryWithAttributeRequest() method doesn’t return attributes in the explicit order added by the POST operation, as illustrated by this example which contains the elements for the first and last items of a request for the first 12 Customer items:

The sequence of attributes in the preceding items is clearly scrambled.

Note: The query is for the first 12 items, not the first 20 as shown above.

Mapping Attributes from the POST Body to List<Customer> Members

The sequence of attributes returned by the QueryWithAttributeRequest() method reflected in the <Item> elements appears in the After Breakdown column of the table below. Breakdown refers to the 12/4/2008 interruption in SimpleDB’s operation as reported in the SimpleDB Drops Dead at ~1:45 PM PST on 12/4/2008 post of 12/4/2008 (updated 12/5/2008).

Ordinal Required Sequence Before Breakdown After Breakdown
0 CustomerID CompanyName CustomerID
1 CompanyName CustomerID ContactTitle
2 ContactName ContactTitle Address
3 ContactTitle Address CompanyName
4 Address Region Region
5 City Country Country
6 Region Fax PostalCode
7 PostalCode PostalCode Fax
8 Country Phone Phone
9 Phone City City
10 Fax ContactName ContactName
Before Breakdown Code and Screen Capture

Before Breakdown required the following code to map the sequence of attributes to obtain the Required Sequence of fields:

Note that the first item in the page (ALFKI) had a different entity mapping, as shown in this Before Breakdown screen capture:

It’s not known if the the single scrambled item problem is repeatable.

After Breakdown Screen Capture and Code Modifications

After SimpleDB resumed operation, regenerating the list with the preceding mapping code resulted in this mess:

All columns except City and Country were incorrectly mapped by the Before Breakdown code.

The After Breakdown items required the mapping code to be modified as follows:

Here’s trial version of the SimpleDB Explorer’s idea of the After Breakdown attribute sequence, which has no relationship to either of the preceding examples:

The Scrambled Attribute Disease Workaround

The workaround to handle arbitrary attribute sequences dynamically involves determining the index (ordinal) of the incoming attribute to assign in sequence to the List<Customer> member. The following C# method provides the required mapping in conjunction with the SimpleDB C# library:

Notice the failed attempt to force sequential return of attributes with the attribs list, which had no affect on the problem. It’s probably safer to execute the loop for each list item to avoid errors from the occasional malformed item.

Update 12/6/2008 8:00 AM: It’s obvious from Stefano’s response that the test should be made for each item, so the If (i == 0) conditional has been removed from the code. Performance for paged return of 12 items isn’t affected materially by the change.

Note: The QueryExpression in the preceding code isn’t valid. It was in limbo while this post was being written.

An overload that substitutes Attribute.Name for the numeric Attribute[index] would eliminate the multiple-loop thrashing.

Substituting a Dictionary for One of the Two Nested Loops

Update 12/6/2008 2:00 PM PST: bd_ posted a comment that suggested substituting a Dictionary collection for the two nested loops. Initializing a Dictionary<string, string> or Dictionary<string, object> with null values provides all required argument values to the C# List<T> constructor. (The VB version with optional parameters having default Nothing values would be simpler.)

The following code assumes that the target object’s properties that aren’t reference values are nullable:

For target classes with varying property types, Parse/TryParse the string values to the appropriate type in the constructor call.

If you have any suggestions for a more efficient method of handling dynamically changing attribute sequences, please let me know in a comment to this post.

3 comments:

bd_ said...

Have you considered simply loading the attributes returned by SimpleDB into a Dictionary, then pulling out values in whatever order you like? This would be a lot simpler and possibly faster than your O(mn^2) deeply-nested for loop there. Moreover AWS confirms that the ordinals are not meant to have any correlation to the output at all: http://developer.amazonwebservices.com/connect/thread.jspa?threadID=26992&tstart=0

Roger Jennings (--rj) said...

@bd_,

Thanks for your suggestion. I've modified the code to use a dictionary, which also solves issues with missing attributes on inserts.

AT said...

It won't solve the problem you've listed here, but if you're doing a lot of .NET work with SimpleDB you may be interested in Simple Savant a new C# interface that I've just open-sourced.