Monday, August 20, 2007

LINQ to SQL Query Execution with LoadOptions Specified

David Hayden, a Florida .NET developer and C# MVP, posted an update to his LINQ To SQL Debugger Visualizer in Visual Studio 2008 Has Limited Value - LINQ To SQL Tutorials item this morning. In LINQ To SQL Debugger Visualizer Revisited - Same Concerns Better UI :), David says:

In the previous post I mentioned some concerns I have with the LINQ To SQL Debugger Visualizer in that the query it displays is not the whole story. ...

Turns out I wasn't even using the Debugger Visualizer - DOH! Here is the real LINQ To SQL Debugger Visualizer, which is a separate download.

He goes on to say:

So, the debugger visualizer is much easier to work with than I previously mentioned in the earlier post, but it still suggests only 1 query will be executed when in actuality it will execute 40+ queries to the database using the following code in beta 2:

using (NorthwindDataContext = new NorthwindDataContext())
{           
    DataLoadOptions options = new DataLoadOptions();    
    options.LoadWith<Customer>(c => c.Orders);
    options.LoadWith<Order>(o => o.Order_Details);    
    var query = from c in context.Customers select c;
}

I talked more about that in the previous post mentioned above. A little deceptive if you ask me, but still useful in other ways. [Emphasis added.]

The problem is that David omitted this critical instruction prior to executing the LINQ query:

NorthwindDataContext.LoadOptions = options;

to attach the DataLoadOptions to the NorthwindDataContext.

Further, his last instruction specifies context instead of NorthwindDataContext from his using expression. 

Update: 8/20/2007 12:00 PM PDT

David added the missing instruction but didn't correct the DataContext name. (See his comment.)

David's query returns EntitySets of Order and Order_Detail objects for each Customer object. When corrected, the preceding query with an iterator executes this query to obtain the Customer objects:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], 
  [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], 
  [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]

It is the preceding query that appears in the Query Visualizer. It's not reasonable to expect the Query Visualizer to interrogate the DataContext.LoadOptions property value and iterate over the DataLoadOptions collection to see what objects are being eader-loaded. 

The SELECT query is followed by this query for each of the 91 Customer objects to retrieve the EntitySets, as you would expect:

exec sp_executesql N'SELECT [t0].[OrderID], [t0].[CustomerID], 
  [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], 
  [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], 
  [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], 
  [t0].[ShipPostalCode], [t0].[ShipCountry], [t1].[OrderID] AS [OrderID2], 
  [t1].[ProductID], [t1].[UnitPrice], [t1].[Quantity], [t1].[Discount], (
    SELECT COUNT(*)
    FROM [dbo].[Order Details] AS [t2]
    WHERE [t2].[OrderID] = [t0].[OrderID]
    ) AS [count]
    FROM [dbo].[Orders] AS [t0]
    LEFT OUTER JOIN [dbo].[Order Details] AS [t1] 
      ON [t1].[OrderID] = [t0].[OrderID]
    WHERE [t0].[CustomerID] = @x1
    ORDER BY [t0].[OrderID], [t1].[ProductID]',N'@x1 nchar(5)',@x1=N'ALFKI'

I find it hard to believe that any developer who deliberately adds code to eager-load 830 Order and 2155 Order_Detail objects would be deceived into believing that a simple SELECT query against the Customers table would retrieve 2,985 rows from two other tables in the database.

I'm also not sure why David mentioned 41+ queries in his post. It's obvious that 91 queries are required to hydrate the EntitySets for 91 Customer objects.

End Update

I tested Beta 2's updated DataContext.LoadOptions feature recently and found that it behaved as advertised with a similar query against the Orders entity set to prevent lazy loading of EntityRef objects:

public partial class NorthwindDataContext {
    partial void OnCreated() {
        DataLoadOptions dlOrder = new DataLoadOptions();
        dlOrder.LoadWith<Order>(o => o.Customer);
        dlOrder.LoadWith<Order>(o => o.Employee);
        dlOrder.LoadWith<Order>(o => o.Shipper);
        this.LoadOptions = dlOrder;
    }
}

(I was testing the LinqDataSource with a bound GridView, so I had to add the LoadOptions collection in the OnCreated event handler of a separate partial class file.)

With GridView.PageSize = 15, loading the first page generates two queries:

SELECT COUNT(*) AS [value]
FROM [dbo].[Orders] AS [t0]
LEFT OUTER JOIN [dbo].[Customers] AS [t1] 
    ON [t1].[CustomerID] = [t0].[CustomerID]
LEFT OUTER JOIN [dbo].[Employees] AS [t2] 
    ON [t2].[EmployeeID] = [t0].[EmployeeID]
LEFT OUTER JOIN [dbo].[Shippers] AS [t3] ON [t3].[ShipperID] = [t0].[ShipVia]

exec sp_executesql N'SELECT TOP 15 [t7].[OrderID], [t7].[CustomerID], 
  [t7].[EmployeeID], [t7].[OrderDate], [t7].[RequiredDate], 
  [t7].[ShippedDate], [t7].[ShipVia], [t7].[Freight], [t7].[ShipName], 
  [t7].[ShipAddress], [t7].[ShipCity], [t7].[ShipRegion], 
  [t7].[ShipPostalCode], [t7].[ShipCountry], [t7].[test], 
  [t7].[CustomerID2], [t7].[CompanyName], [t7].[ContactName], 
  [t7].[ContactTitle], [t7].[Address], [t7].[City], [t7].[Region], 
  [t7].[PostalCode], [t7].[Country], [t7].[Phone], [t7].[Fax], [t7].[test2],
  [t7].[EmployeeID2], [t7].[LastName], [t7].[FirstName], [t7].[Title], 
  [t7].[TitleOfCourtesy], [t7].[BirthDate], [t7].[HireDate], 
  [t7].[Address2], [t7].[City2], [t7].[Region2], [t7].[PostalCode2], 
  [t7].[Country2], [t7].[HomePhone], [t7].[Extension], [t7].[Notes], 
  [t7].[ReportsTo], [t7].[test3], [t7].[ShipperID], [t7].[CompanyName2], 
  [t7].[Phone2]
FROM (
  SELECT ROW_NUMBER() OVER (ORDER BY [t0].[OrderID], [t0].[CustomerID], 
    [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], 
    [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], 
    [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], 
    [t0].[ShipPostalCode], [t0].[ShipCountry], [t2].[test], 
    [t2].[CustomerID], [t2].[CompanyName], [t2].[ContactName], 
    [t2].[ContactTitle], [t2].[Address], [t2].[City], [t2].[Region], 
    [t2].[PostalCode], [t2].[Country], [t2].[Phone], [t2].[Fax], 
    [t4].[test], [t4].[EmployeeID], [t4].[LastName], [t4].[FirstName], 
    [t4].[Title], [t4].[TitleOfCourtesy], [t4].[BirthDate], [t4].[HireDate], 
    [t4].[Address], [t4].[City], [t4].[Region], [t4].[PostalCode], 
    [t4].[Country], [t4].[HomePhone], [t4].[Extension], [t4].[ReportsTo], 
    [t6].[test], [t6].[ShipperID], [t6].[CompanyName], [t6].[Phone]) 
  AS [ROW_NUMBER], 
    [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], 
    [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], 
    [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], 
    [t0].[ShipPostalCode], [t0].[ShipCountry], [t2].[test], 
    [t2].[CustomerID] AS [CustomerID2], [t2].[CompanyName], 
    [t2].[ContactName], [t2].[ContactTitle], [t2].[Address], [t2].[City], 
    [t2].[Region], [t2].[PostalCode], [t2].[Country], [t2].[Phone], 
    [t2].[Fax], [t4].[test] AS [test2], [t4].[EmployeeID] AS [EmployeeID2], 
    [t4].[LastName], [t4].[FirstName], [t4].[Title], [t4].[TitleOfCourtesy], 
    [t4].[BirthDate], [t4].[HireDate], [t4].[Address] AS [Address2], 
    [t4].[City] AS [City2], [t4].[Region] AS [Region2], 
    [t4].[PostalCode] AS [PostalCode2], [t4].[Country] AS [Country2], 
    [t4].[HomePhone], [t4].[Extension], [t4].[Notes], [t4].[ReportsTo], 
    [t6].[test] AS [test3], [t6].[ShipperID], 
    [t6].[CompanyName] AS [CompanyName2], [t6].[Phone] AS [Phone2]
  FROM [dbo].[Orders] AS [t0]
    LEFT OUTER JOIN (
      SELECT 1 AS [test], [t1].[CustomerID], [t1].[CompanyName], 
        [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], 
        [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], 
        [t1].[Phone], [t1].[Fax]
      FROM [dbo].[Customers] AS [t1]
        ) AS [t2] ON [t2].[CustomerID] = [t0].[CustomerID]
    LEFT OUTER JOIN (
      SELECT 1 AS [test], [t3].[EmployeeID], [t3].[LastName], 
        [t3].[FirstName], [t3].[Title], [t3].[TitleOfCourtesy], 
        [t3].[BirthDate], [t3].[HireDate], [t3].[Address], [t3].[City], 
        [t3].[Region], [t3].[PostalCode], [t3].[Country], [t3].[HomePhone], 
        [t3].[Extension], [t3].[Notes], [t3].[ReportsTo]
      FROM [dbo].[Employees] AS [t3]
        ) AS [t4] ON [t4].[EmployeeID] = [t0].[EmployeeID]
    LEFT OUTER JOIN (
        SELECT 1 AS [test], [t5].[ShipperID], [t5].[CompanyName], 
          [t5].[Phone]
        FROM [dbo].[Shippers] AS [t5]
        ) AS [t6] ON [t6].[ShipperID] = [t0].[ShipVia]
    ) AS [t7]
WHERE [t7].[ROW_NUMBER] > @p0',N'@p0 int',@p0=15

I probably wouldn't have gone through the trouble to post this rejoinder if:

  1. David hadn't stated that 40+ queries were required to execute his query.
  2. The bogus issue he raised was the result of a missing instruction not expecting the Query Visualizer to evaluate the DataContext.LoadOptions property value for eager-loading instructions. 
  3. He hadn't used the term "deceptive" in describing the Query Visualizer's failure to detect them.
  4. He had comments enabled on his blog.
  5. The ADO.NET team hadn't gone through an extreme effort to build the required T-SQL batch in the query pipeline's expression tree.

5 comments:

Anonymous said...

I did include the LoadOptions when running the code, but I forgot to include it in the post. It is there now.

Your query is totally different, so of course you won't get 40+ queries.

The key is that the visualizer does not show the additional queries it runs to do the prefetching.

Dave

Roger Jennings (--rj) said...

The "context" object remains missing because the using expression creates a new "NorthwindDataContext" object.

See my update to the post.

-rj

Anonymous said...

Roger: Dave is correct, the LoadOptions totally fails to create an efficient set of queries. A graph to fetch eagerly (prefetched) has to have 1 query per node, tops. You could optimize it out with a join between a graph path if all relations on that path are 1(fk):1(pk) or m:1 so the join would result in no duplicates, however that's pretty limiting because you then have to implement custom logic to handle each of the cases, which can be pretty daunting and it's not said that the join is more efficient than the 1 query per node part. (for example if you add limitation per node, filters per node etc. it's pretty obvious you need 1 query per node)

Anyway, please check these links, and you'll learn that the LoadOptions is pretty much broken:
http://codebetter.com/blogs/david.hayden/archive/2007/08/06/linq-to-sql-query-tuning-appears-to-break-down-in-more-advanced-scenarios.aspx

It simply doesn't work as you'd expect: you would think there would be at most 3 queries but there are many queries due to the lazy loading. The MSDN forums thread describes why: one of the Linq designers explains there that it's by design.

Which IMHO is pretty lame, as it takes roughly 100-150 lines of code to implement basic prefetch path code with 1 query per node + merging.

Roger Jennings (--rj) said...

I wasn't arguing the merits of LINQ to SQL's lazy/eager-loading technology with Dave, I objected to his characterization of the SQL Visualizer as "deceptive."

It appears to me that eager-loading uses a single query with joins for 1:1 associations (EntityRefs) and one query with joins per node for 1:many associations (EntitySets). I'll be conducting a few tests in the next day or two and see what Profiler reports. See the queries that I copied to the post.

Anonymous said...

"...you'll learn that the LoadOptions is pretty much broken...
It simply doesn't work as you'd expect..."

@Frans: Maybe your expectations are broken, but it does work as advertised. You can only specify load options from one table to the other, nowhere can you specify a complete path. It _is_ by design, and you need to adjust your expectations.

The correct way to criticize is to say that LINQ2SQL is missing capabilities to specify preload options for a complete path. But I'm reading/interpreting that you (and others) see this as a bug, which is factually incorrect.

PS. It's version 1.0, it's not going to have everything you need, even if this is Microsoft. It's going to have the most important features to allow _many_ people to enjoy the benefits. Based on the right suggestions, version 2 might have what _you_ need, but pointing fingers is not motivating for them. You wouldn't have appreciated that either when your framework was still young.