Saturday, June 14, 2008

Matthieu Mezil Illustrates Use of AsEnumerable() That Mystifies Me

Matthieu’s AsEnumerable: not only to use unsupported methods post of June 13, 2008 discusses the differences in the results of a pair of complex LINQ to SQL Entities queries whose primary difference is use of AsEnumerable() to solve a problem obtaining Order instances with OrderDates in 1998 having French customers together with their Customer instances.

Updated 6/15/2008: See tests with Entity Framework below.

I would have used a LINQ to SQL or LINQ to Entities query with the results shown here in LINQPad to retrieve the Order/Customer instances with LINQ to SQL:

The preceding query generated the following T-SQL batch:

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], [t0].[ShipPostalCode], [t0].[ShipCountry], [t2].[test], [t2].[CustomerID] AS [CustomerID2], [t2].[CompanyName], [t2].[ContactName], [t2].[ContactTitle], [t2].[Address], [t2].[City], [t2].[Region], [t2].[PostalCode], [t2].[Country], [t2].[Phone], [t2].[Fax]
FROM [Orders] AS [t0]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
    FROM [Customers] AS [t1]
    ) AS [t2] ON [t2].[CustomerID] = [t0].[CustomerID]
WHERE (DATEPART(Year, [t0].[OrderDate]) >= @p0) AND ([t2].[Country] = @p1)
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1998]
-- @p1: Input NVarChar (Size = 6; Prec = 0; Scale = 0) [France]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30428.1

That batch returned the following result in SQL Server Management Studio:

which looked correct to me.

I wasn’t able to execute Matthieu’s queries in LINQPad, but I did execute his two T-SQL queries in SQL Server Mangement Studio.

Here’s the first result, which doesn’t apply the Country = ‘France’ constraint:

And here’s the second batch’s result set from the query that added AsEnumerable():

Which is a LEFT OUTER JOIN and doesn’t look to me like what Matthieu said he was looking for.

I’ve asked Matthieu to clarify his objective for me.

Update: 6/15/2008: In a reply to my comment on Matthieu’s post, he says:

My post wasn't clear so I change it.

I think the difference is that you do a LINQ to SQL query and I do a LINQ To Entities query.

To see if there were differences using an Entity Data Model, I created a simple WinForm app with a Northwind ObjectContext.

My LINQ to Entities query:

    using (NorthwindEntities ctxNwind = new NorthwindEntities())
    {
        var query = from o in ctxNwind.Orders
                    where o.OrderDate.Value.Year >= 1998 &&
                        o.Customers.Country == "France"
                    select new 
                    {
                        OrderID = o.OrderID, 
                        OrderDate = o.OrderDate,
                        CustomerID = o.Customers.CustomerID,
                        Country = o.Customers.Country 
                    };
        dgvQueryResult.DataSource = query.ToList();
    }

gave the expected result: 23 rows of orders with customer data in a DataGridView. 17 of the 23 rows are shown here:

I then ran Matthieu’s LINQ to Entities queries with and without AsEnumerable():

    using (NorthwindEntities ctxNwind = new NorthwindEntities())
    {
        var query = from oc in
                       from o in ctxNwind.Orders
                        where o.OrderDate.HasValue && 
                            o.OrderDate.Value.Year >= 1998
                        select new { Order = o, Customer = ctxNwind.Customers
                            .Where(c => c.CustomerID == o.Customers.CustomerID &&
                                c.Country == "FRANCE").FirstOrDefault() }
                    select new
                    {
                        OrderID = oc.Order.OrderID,
                        OrderDate = oc.Order.OrderDate,
                        CustomerID = oc.Customer.CustomerID,
                        Country = oc.Customer.Country
                    };
        dgvQueryResult.DataSource = query.ToList();
    }

and

    using (NorthwindEntities ctxNwind = new NorthwindEntities())
    {
        var query = from oc in
                       (from o in ctxNwind.Orders
                        where o.OrderDate.HasValue && 
                            o.OrderDate.Value.Year >= 1998
                        select new { Order = o, Customer = ctxNwind.Customers
                            .Where(c => c.CustomerID == o.Customers.CustomerID &&
                                c.Country == "FRANCE").FirstOrDefault() })
                            .AsEnumerable()
                    select new
                    {
                        OrderID = oc.Order.OrderID,
                        OrderDate = oc.Order.OrderDate,
                        CustomerID = oc.Customer.CustomerID,
                        Country = oc.Customer.Country
                    };
        dgvQueryResult.DataSource = query.ToList();
    }

Both of the preceding two queries returned the same data as shown in the last T-SQL example above: A 270-row outer join with 23 rows containing customer data. Adding the AsEnumerable() Standard Query Operator does nothing that I can determine.

My MezilEntitiesQueriesTest project is available for download here. The project expects to find the Northwind sample database attached to a local SQL Server 2005 Express instance (localhost\SQLEXPRESS).

2nd Update 6/15/2008 1545 PDT: The Light Dawns (sort of) … Bad Things Are Happening Here

Matthieu’s comment clarifies an invisible and probably unwanted effect of adding AsEnumerable() to the first query: My outer select clause returns scalar values from Customers and Orders. His outer select oc.Order clause returns only Orders entities and makes no request for Customer-related data.

Although Matthieu says near the beginning of this post, we “need to load into our context the right orders and customers.” Later, he says “the SQL request gets the orders we want and the customers we want,” the truth is, he doesn’t need or want the customers. If he had wanted the customers, he would have added oc.Customer[.Property] to his select clause.

LINQ to Entities’ (or LINQ to SQL’s) deferred execution implementation doesn’t write an extraordinary amount of T-SQL to return data that isn’t used. Adding AsEnumerable() caused immediate execution of Matthieu’s inner select clause, which includes a requirement for Customers. As a result, a very large number of cycles were wasted.

What’s worse, however, is that adding AsEnumerable() forced local filtering of the 270-row OUTER JOIN result set to include only 23 Customers+Orders instances in the query output.

Don’t try this at work. Also, it’s a good idea to execute the T-SQL batches of complex queries in Query Analyzer (or whatever) now and then to see if your LINQ to SQL or LINQ to Entities query is returning more rows that you’re using.

6 comments:

Anonymous said...

Hi Roger,
I understand the problem: your select takes CustomerID and CustomerCountry. To have what I want to show in my post, you should return only orders (select oc.Order) and so ignore Customer informations in last select.
Matthieu

Roger Jennings (--rj) said...

Use of AsEnumerable() in this case unleashes bad mojo. See Update #2.

--rj

Anonymous said...

"he doesn’t need or want the customers." In fact, I want the order customer only if he leaves in France. But yes I don't want directly the French customers.

Anonymous said...

"What’s worse, however, is that adding AsEnumerable() forced local filtering of the 270-row OUTER JOIN result set to include only 23 Customers+Orders instances in the query output."

Yes and no. In my case, I add neither a where clause after my AsEnumerable nor a restrictive select.

Anonymous said...

Moreover, I am not agree with you when you say: "Adding AsEnumerable() caused immediate execution". For example, if I only define my query (with AsEnumerable) and I never use it, I won't have any SQL request in my DB.

Anonymous said...

I try to clarify it here (http://msmvps.com/blogs/matthieu/archive/2008/06/16/clarifying-quot-asenumerable-not-only-to-use-unsupported-methods-quot.aspx)