Wednesday, August 22, 2007

Eager Loading Appears to Cause LINQ to SQL Entity Table Problems

Frans Bouma, David Hayden, Howard Richards and I have encountered an undesirable feature of LINQ to SQL having eager loading enabled with the DataContext.LoadOptions property value set to a collection of DataLoadOptions.LoadWith expressions. Using the Northwind Customers, Orders, and Order_Details tables as the source for test entity sets with 1:1 and 1:many associations, we all appear to believe the following DataContext.LoadOptions property value (or its C# equivalent) should load the Customer, Order, and Order_Detail entity sets with a single query, or at most a few queries:

Dim dcNwind = New NorthwindDataContext
Dim dlOptions As New DataLoadOptions
dlOptions.LoadWith(Of Customer)(Function(c) c.Orders)
dlOptions.LoadWith(Of Order)(Function(o) o.Order_Details)
dcNwind.LoadOptions = dlOptions

For example, Dave's test shows Frans Bouma's LLBLGen Pro loads all  Customer, Order, and Order_Detail entity sets with three queries. Dave's LINQ to SQL test required 40+ queries and my tests show 92 for the original 91 customers, but we all appear to agree that LINQ to SQL Beta 2 requires Customers.Count + 1 queries. The large number of queries is undesirable but not a showstopper.

Howard Richards postulated a bug in this LINQ to SQL LoadOptions does not work for more than one level Feedback bug report:  LoadOptions doesn't populate EntitySets until the iterator encounters a request for a lower-level value and Microsoft finally acknowledged it's a bug.

I don't find this to be an exact description of the real problem. The following iterator populates the associated EntitySets with 14 queries for 13 U.S. customers:

Dim qryTest = From c In dcNwind.Customers _
              Where c.Country = "USA" _
              Select c

For Each c In qryTest
    sbResult.Append(String.Format("Customer: {0}", _
        c.CompanyName) & vbCrLf)
Next c

The Real Problem

I believe what Howard encountered was failure of the entity table/object tracker/identity management service to recognize the prior existence of the objects in memory, which caused the query pipeline to requery the database on successive LINQ query executions. This occurs despite the following statement in the "Object Identity" section LINQ to SQL: .NET Language-Integrated Query for Relational Data whitepaper:

Of course, if the object requested by the query is easily identifiable by its primary key as one already retrieved no query is executed at all. The identity table acts as a cache storing all previously retrieved objects.

All three objects meet the preceding requirement and, as to be demonstrated shortly, behave almost as expected with multiple queries when deferred loading is enabled. (The preceding quotation makes the single initial query unexpected.) Object tracking is enabled by default (DataContext.ObjectTrackingEnabled = True.)

The LINQ to SQL Eager Versus Deferred Loading Test Harness project (EagerLoadingTestHarnessVB.sln) executes a set of queries with various DataContext.LoadOptions and DataContext.DeferredLoadingEnabled property value settings (see Figure 1.) Marking Deferred Loading Enabled, Set LoadOptions, Employee with Order, Shipper with Order or Project with Detail create a new DataContext object. Marking Show Query Result and clicking any of the buttons uses the same (current) DataContext object. Mark Show Query Result to replace the text box's T-SQL batch with the iterator's string output.

The following table reports the number of queries for each of three different iterators (the LINQ query is identical in all cases):  


Get US Customers Get Orders Get Order_Details
Deferred First 1 14 123
Deferred Subsequent 1 1 1
Eager First 14 14 14
Eager Subsequent 14 14 14

Requerying the database and reloading the objects into memory for each execution of every query against a DataContext.Table<TEntity> more than offsets any database load reduction from the initial eager loading. This problem, if confirmed by Microsoft and/or others, is a showstopper.

You can run your own tests by downloading from the SkyDrive beta site. Right click the Zip file, choose Properties, click Unblock to eliminate the Untrusted security warning, and then unzip the files with Use Folder Names marked to a test folder. The NorthwindConnectionString in the app.config file expects to find a Northwind database in a local SQL Server 2005 Express instance (Data Source=.\SQLEXPRESS) and doesn't attach a Northwnd.mdf file. Make the necessary connection string modifications for your Northwind database setup and double-click EagerLoadingTestHarnessVB.sln.

Please leave comments with questions or observations.

Update 8/23/2007: Added as Connect bug 294781: LINQ to SQL Objects Eager Loaded with LoadOptions Aren't Recognized by Object Tracker, plus minor edits, updates and corrections.

Update 8/24/2007: Microsoft was able to reproduce the bug. Added "The Real Problem" subhead for emphasis.

Update 8/24/2007: TopObjectInHiearchy.Count + 1 queries for eager loading an object graph with two 1:n associations is by design, according to Scott Guthrie, but that's a bad design. This problem could turn out to be a showstopper for many developers and/or projects. See Clarification of the Object Tracking Problem with LINQ to SQL's Eager Loading Feature.


Dody G. said...

There's no way Linq for SQL be usable for intermediate scenarios for web systems. 90 queries for web system will kill the application.