Monday, October 15, 2007

GetOriginalEntityState() Loses EntitySet/EntityRef Data

Mike Taulty's LINQ to SQL and WCF - Sharing types, subverting the DataContext on the client side post of October 8, 2007 suggests adding an "unbound" or "connectionless" DataContext object to a Windows Communication Foundation (WCF) client for tracking LINQ to SQL data changes.

As mentioned in the "Mike Taulty Implements an Unbound DataContext in a WCF Client" topic of LINQ and Entity Framework Posts for 10/5/2007+, a DataContext object that's not bound to a database is similar in concept to Matt Warren's promised serializable "mini connectionless DataContext" that I discussed in my Changes Coming to LINQ to SQL post of May 15, 2007. This feature won't be present in LINQ to SQL v1, but a useful alternative would be the source code for such an object to include in detached projects as a class library. Mike's approach appears to be useful as a workaround until the mini connectionless DataContext becomes available, but it requires an operable Table<TEntity>.GetOriginalEntityState(modifiedMember) method. 

Matt Warren says the method won't be changed! See Update at end of post.

The primary advantage of the client-side DataContext approach for a service-oriented LINQ to SQL implementation is a dramatic decrease in the number of WCF messages transmitted after editing multiple objects in a session. The client sends a List<T> of objects to be added, modified, or removed instead of a message for each individual object, which nicely implements Martin Fowler's Unit of Work pattern. The disadvantage is that the client uses only a small part of the heavyweight DataContext's resources, which primarily translate LINQ queries to SQL statements.

You can't serialize the DataContext itself but you can serialize the original entity state and its changes into List<TEntity> objects, store them in an ASP.NET Web page's ViewState and restore them after postbacks, as described by Julie Lerman for the Entity Framework in her Entity Framework object graphs and viewstate post of October 5, 2007.

Evidence of the GetModifiedMembers() Problem and Related Bugs

While testing a slightly more complex implementation of Mike's Client.exe program that eager-loads the Customer.Orders and Order.Order_Details (EntitySet) properties, I discovered that the Table<TEntity>.GetOriginalEntityState(modifiedMember) method applied to the DataContext.Customers returned Customer objects with Customer.Orders.Count = 0. The Table<TEntity>.GetModifiedMembers(ch => ch.ModifiedEntities) method has correct Customer.Orders.Count values so the change tracker interprets the Order objects as added and generates the T-SQL statements to add new Orders, as shown in the screen capture below.

A search on GetOriginalEntityState from Connect for Visual Studio's Search for a Duplicate Issue page returns no hits.

The problem has symptoms similar to those I reported in my Eager Loading Appears to Cause LINQ to SQL Entity Table Problems post of August 22, 2007 and Clarification of the Object Tracking Problem with LINQ to SQL's Eager Loading Feature item of August 24, 2007. However, the following fix proposed on 9/20/2007 by Alex Turner in my bug report 294781, LINQ to SQL Objects Eager Loaded with LoadOptions Aren't Recognized by Object Tracker (of 8/23/2007, status changed to fixed on 9/4/2007) doesn't reflect the failure of Table<TEntity>.GetOriginalEntityState(modifiedMember) to include EntitySet or EntityRef members:

Thanks for reporting this issue you've encountered with Visual Studio 2008! We looked into the issue and it was exactly as you described in your blog: when re-executing an eager-loaded query, we would send the top-level query again (the 1), but then after getting back the list of n objects that matched, send out n more queries to fetch their details, even if we already had them in the cache! In fact, when our Beta 2 logic went to materialize the returned objects from the redundant n queries, it would check then, see it already had the objects, and throw the data away. We've promoted that check to occur before dispatching the eager-loaded child queries and now when the query is re-executed, only the single top-level query is sent (in the table on your blog, the Eager Subsequent row would now look like the Deferred Subsequent row, all 1's).

Thanks again for the report!

Alex Turner
Program Manager
Visual C#

Checking for existence of objects in the local cache before issuing T-SQL SELECT batches won't solve this problem.

The problem might be a side-effect of or responsible for the issue described in bug report 295402: Incorrect Update Behavior - Attaching an Entity after Calling a Relationship Property of 8/27/2007. This report was closed as [to be] Fixed on 9/5/2007. For more details on that bug report and a sample of its affect on LINQ to SQL applications, see the "Fighting the Attached Associated Objects Bug" of LINQ to SQL and Entity Framework XML Serialization Issues with WCF - Part 1 of 10/9/2007.

The ClientSideContext Class and Initial WCF Test Harness

The unbound DataContext instance (ClientSideContext) provides Attach(), AttachAll(), Add(), AddAll(), Remove()and RemoveAll() methods for Table<TEntity> implementations of the ITable interface and stores the original and changed entity values. You apply changes to the attached entities directly. The Table<TEntity>.GetModifiedMembers() method contains references to added, removed, and modified members.

Invoking Mike's GetInserted(), GetModified() and GetDeleted() methods supply modified argument values to InsertEntity(modified, original) and UpdateEntity(original, modified) service methods. The Table<TEntity>.GetOriginalEntityState() method provides original argument values to the preceding two service methods, plus the DeleteEntity(original, modified) method. (DeleteEntity() requires both arguments because you must attach the entity before you remove it.)

Mike's sample code deals with the DataContext.Customers entity only and sets DataContext.DeferredLoadingEnabled = False on the client and DataContext.ObjectTrackingEnabled = False on the server when retrieving objects from the persistence store. Mike's test environment doesn't populate the Customer.Orders and Order.Order_Details (EntitySet) properties, and unidirectional binding prevents serializing the Order.Customer and Order_Details.Order (EntityRef) property values. The problem doesn't manifest until you eager-load the associated EntitySets.

Here's a screen capture of the first test harness showing a typical spurious INSERT operation for Order objects:

Even with only the seven UK orders processed by Mike's original client code, it doesn't take very many executions to add enough orders to exceed the client's default maxReceivedMessageSize value and throw Bad Request (HTTP: 400) exceptions.

Second Test Harness

To verify that the problem wasn't related to WCF or HTTP transport issues, I modified the code of an earlier test harness for conventional client/server LINQ to SQL operation. Here's the result of making a small change (Market -> Mart) to the GREAL Customer and (6 -> 7) to its last Order object and clicking Submit, which has test code added to it's Click event handler:

Click image for full size screen capture.

This test harness, which uses a standard Northwind database instance, is included with the new bug report.

The number of late (post beta 2) blocking bugs related to associations indicates to me that LINQ to SQL's automated test regimen doesn't include or or lacks depth in tests with objects from one or more related entities.

The Connect bug report is Feedback 304732: GetOriginalEntityState Doesn't Include EntitySet or Entity Ref Data for Related Entities of 10/15/2007.

Workaround

The obvious workaround is to use a timestamp or last-modified datetime column for consistency control. If you can't modify the underlying database tables, the alternative is to write code that maintains initial entity state information in a local List<TEntity>, as Julie Lerman proposed for the Entity Framework in her XML Serializing Entity Framework entities with their children for SOA post of October 2, 2007.

Update October 16, 2007: No Fix in Store for This Problem

Matt Warren added the following comment to the "LINQ to SQL's GetOriginalEntityState() Method Has Serious Bug" topic of my LINQ and Entity Framework Posts for 10/15+ post:

Roger, its not possible for GetOriginalEntityState to return copies of entities with EntityRef's and EntitySet's set. These associations are normally bi-directional, so only one entity instance can ever refer to another without duplicating the entire graph. The purpose of GetOriginalEntityState is to give back a copy of the entity with only the data fields set.

The "ITable.GetOriginalEntityState Method" help topic states that the method "Retrieves original values." The help topic states further in its remarks section:

Note the following:

  • The entity argument must be non-null. Otherwise, a null argument exception is thrown.

  • In the case of the strongly typed (TEntity) method: the type must be mapped and must be an entity type. That is, it must have object id information in its mapping. Otherwise, a wrong type exception is thrown.

There is nothing in the method's name or its help topic that would lead one to believe that its purpose "is to give back a copy of the entity with only the data fields set." Further, the corresponding modified members include EntitySet copies as property values. I no reason to treat original and modified members' EntitySets differently. (Serialized associations are unidirectional.)

The workaround doesn't appear to be pretty. Hopefully, fixing the two related bugs discussed earlier will mitigate the excessive number of T-SQL queries executed by the service.

0 comments: