Thursday, December 20, 2007

Controlling the Depth and Order of EntitySets for 1:Many Associations

Updated 12/23/2007: See the "Danny Simmons Answers My EntityCollection Depth and Serialized EntityReferences Issues" topic of the LINQ and Entity Framework Posts for 12/20/2007+ post for more details about the issues raised in this post's "Where's the Leash on EntitySet Depth in the Entity Framework?" Many thanks to Danny for his detailed, illustrated answers to the issues I raised.

Service-oriented architecture (SOA) with data-centric Web services having an object/relational mapping (O/RM) tool as the data access layer (DAL) requires the data transfer object (DTO) to eager-load EntitySets for 1:many associations. Lazy loading is proscribed because that would require the service consumer to exchange messages with the service for each associated entity or, perhaps, groups of entities.

Real-world data potentially involves business objects having 1:many associations with EntitySets that could contain thousands or even millions of rows: for example, all Order entities associated with a Customer or Salesperson entity for a large enterprise. Obviously, just serializing an entity having an EntitySet property containing several thousand entities could bring the service to its knees. It's obvious that a practical O/RM tool must provide a practical method to control the number (depth) of sub-entities in the EntitySet returned with the base entity. If fewer than all associated entities are returned, the tool must (in most cases) enable ordering the set before applying the filter.

The classic example of the requirement for ordering the members of an EntitySet property is returning orders for a customer or set of related customers to a customer service agent. To minimize retries, the business policy might be to return the last five orders for customers on the first request, which corresponds to a T-SQL SELECT TOP(5) * FROM Orders WHERE CustomerID = 'RATTC' ORDER BY OrderID DESC query.

Update 12/21/2007: Preceding query completed by adding ORDER BY OrderID DESC and other minor clarifications and additions.

The LINQ to SQL team anticipated this requirement by providing the DataLoadOptions.AssociateWith() method, which accepts a lambda expression as its argument. For example

dlOptions.AssociateWith<Customer>(Customer c =>  _
    c.Orders.OrderByDescending(Order o => o.OrderID).Take(5))

or

dlOptions.AssociateWith(Of Customer)(Function(c As Customer) _
    c.Orders.OrderByDescending(Function(o As Order) o.OrderID).Take(5))

added to the DataContext.LoadOptions collection generates the following T-SQL statement:

SELECT TOP (5) [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], [t0].[ShipPostalCode], [t0].[ShipCountry]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[CustomerID] = ((
    SELECT [t2].[CustomerID]
    FROM (
        SELECT TOP (1) [t1].[CustomerID]
        FROM [dbo].[Customers] AS [t1]
        WHERE [t1].[CustomerID] = @p0
        ) AS [t2]
    ))
ORDER BY [t0].[OrderID] DESC

which returns the last five Orders rows for the Customers row with CustomerID =[=] 'RATTC'. The preceding query won't win any performance awards and each Customer entity requires a separate query, but it's certainly a win over returning and throwing away thousands of Order and Order_Detail entities.

Alternatively, you can overload some methods in the code-generated DataContext classes with a partial class file and substitute a stored procedure (with optional parameters) for the lambda expression.

You get even a bigger win if you need to limit members of lookup list collections, such as  Product EntityRefs for eager-loaded Order_Detail entities only. This filter is especially important for organizations with a very large number of different products. However, serializing many:1 and 1:many associations creates cyclic relationships that binary serialization can handle but requires enabling the DataContractSerializer.PreserveObjectReferences property with a cumbersome shim/hack for XML serialization.

Unfortunately, there are no commonly observed standards for serializing and deserializing the ID/IDREF attributes in XML Infosets required to handle bidirectional references, so Microsoft's XML team purposely made it difficult to set DCS's PreserveObjectReferences property to true.

As mentioned in my Serializing Object Graphs Without and With References post of November 21, 2007:

An alternative to [the DataContractSerializer] (DCS) is the NetDataContractSerializer (NDCS), which can serialize graphs with 1:many and many:1 (bidirectional) associations by preserving association references in ID/IDREF pairs of Id and Ref values. NDCS delivers graph fidelity at the expense of interoperability because it requires clients to support CLR types. A factor in lack of interoperability with other vendors' Web service toolkits is the complexity of schemas generated by NDCS and Microsoft's patent that appears to cover NDCS's method of serializing/deserializing objects.

Controversy still seeths over the comparative benefits of serializing the object graph or maintaining interoperability by serializing a object tree and ignoring many:1 associations. It's possible to do this if you load the EntityRefs in a second step and have foreign key values with which to regenerate the EntityRef associations on the client. LINQ to SQL entities expose foreign key values; serializable EntityKeys in Entity Framework (EF) Beta 3 and later might make client-side regeneration of EntityReferences possible.

Where's the Leash on EntitySet Depth in the Entity Framework?

Danny Simmons brought up the issue of selective serialization of EF's association EntityRefs in his Question: Deep serialization of an object graph--how deep should it go? post of November 20, 2007. Here's an excerpt from his post:

1) The entire graph connected to the customer is returned every time. If you are building a stateless webservice, then you would likely construct a new ObjectContext instance each time the method is called, retrieve from the DB just those entities you want to return and then return them. In this scenario, returning the entire graph every time works just fine because the entire graph contains exactly what you want to return.

I had expected the ObjectQuery.Include() and ObjectContext.EntityCollection.Load() methods to provide an overload that delivered the equivalent to LINQ to SQL's DataLoadOptions.AssociateWith() method by Beta 3. However, a search of the Beta 3 documentation, such as Shaping Query Results (Entity Framework), the ADO.NET (Prerelease) forum, or anyone's blog (including Julie Lerman's). Perhaps I'm not using the correct search terms.

All sample code I've seen for the the ObjectQuery.Include() and ObjectContext.Entity.Load() method refers to "all" order line items, which is de rigueur, or a single salesperson's few orders. In all examples I've seen, including messages the forum, the argument is an quoted entity (table) name.

Disassembling ObjectQuery.Include() with Reflector gives the following:

public ObjectQuery<T> Include(string path)
{
    EntityUtil.CheckStringArgument(path, "path");
    ObjectQueryImplementation<T> queryImpl = this._impl.Clone();
    queryImpl.EnsureSpan().Include(path);
    return new ObjectQuery<T>(queryImpl);
}

Parameters: path

Dot-separated list of navigation properties that describe the query path in the graph that should be eagerly fetched. For collection types, the navigation property is lifted so it looks like a member of the collection.

Return Value: A new ObjectQuery<> with the defined query path.

The function's signature indicates to me that it's restricted to a particular entity that can be tracked along the navigation path from the base entity.

Note: Zlatko Michailov refers to ObjectQuery.Include() as an "exact-entity" operator in this answer in the forum:

Include() is an “exact-entity” operator of ObjectQuery<T>. (Reference: http://msdn2.microsoft.com/en-us/library/bb738708(VS.90).aspx) It denotes a navigation property that you want to dereference and materialize as part of your resulting entity collection. That’s why the “T” in the ObjectQuery<T> must be an entity type, i.e. it may not be DbDataRecord.

A Google search on "exact-entity" objectquery and "exact-entity" "entity framework" returns only the preceding forum answer as a valid hit.

NHibernate and the Castle Project use MultiQueries such as the following to specify the depth of EntitySets:

session.CreateMultiQuery()
.Add("from Blog b left join fetch b.Posts where b.Id = 1")
.Add("from Post p left join fetch p.Comments where p.Blog.Id = 1")
.List();

(From a comment to Oren Eini's Thinking about the Entity Framework post of May 16, 2007. There's a better example in Oren's Efficently loading deep object graphs post of June 20, 2007.)

It's my conclusion that without a workable (i.e., internal, dynamic SQL) means to order and limit the size (depth) of EntityCollections from associations that EF would be a non-starter as an enterprise-scale O/RM tool (a.k.a., "ObjectSpaces Reloaded"). That's why I'm questioning my search terms.

Unless EF has an answer for serializing EntityReferencess to associations or regenerating EntityReferencess from custom EntityCollection lookup collections on the consumer, EF will join LINQ to SQL as two Microsoft O/RM tools that have no "out-of-the-box multi-tier story" and thus aren't considered enterprise-ready. However, if Danny Simmons is successful in implementing his "General-purpose Container Object" functionality over EF, which he describes in his December 19, 2007 So they're hard, but what if I need them... post ("them" refers to Web services), the story might not be "out-of-the-box" but it would be a story. (The way I read this post is that all associated entities are in memory before serialization.)

Update 12/23/2007: Incorrect search terms weren't the problem for documentation for filtering associations' EntityCollections. As mentioned in the LINQ and Entity Framework Posts for 12/20/2007+ post:

A search on 'CreateSourceQuery' returns valid hits to online help and Danny's post only, so there doesn't appear to be any sample code for the method, which also has an EntityReference counterpart.

I'm surprised that other users haven't raised questions by now about filtering sorted EntityCollections.

Note: I'm still waiting for even a demo version of Matt Warren's serializable "mini connectionless DataContext" for LINQ to SQL (see the 6/18/2007 update to the "Simplifying Data Tracking in n-Tier Projects" topic of my Changes Coming to LINQ for SQL post of May 15, 2007. In the meantime, Mike Taulty's approach that I discussed in the GetOriginalEntityState() Loses EntitySet/EntityRef Data post of October 15, 2007 almost works for LINQ to SQL.

Comments anyone?

5 comments:

Mat said...

I've also been waiting since early 2006 for the general-tier story for EF. Maybe it situates as a backend/2-tier service like hibernate and its kin.

Cheers,
-Mat

Jonathan Pugh said...

"serializing many:1 and 1:many associations creates cyclic relationships that binary serialization can handle"

I tried passing an Entity object that has a 1:Many EntityCollection in it and after it passes through the WCF layer, the collection always has a count of zero. I am using Named Pipes which use binary serialization, so shouldn't it preserve the collection? The data is in the collection before it is returned in the OperationContract method, but afterwards it just disappears.

Any ideas?

Sydney said...

I know this post is pretty old and may be outdated, but you said: "Alternatively, you can overload some methods in the code-generated DataContext classes with a partial class file and substitute a stored procedure (with optional parameters) for the lambda expression."

I have a situation in which I am only allowed db access via sproc and have a heavily heirarchical data structure in an SOA environment. So, needless to say, I'd like to use DataLoadOptions to eager load child objects, but it seems to me that I need to somehow tell LINQ to use a particular sproc to load the child object/collection. The implication in your post that I might somehow be able to do this is exciting, but I haven't seen anything like this elsewhere. Is this possible?

Thanks.

--rj said...

@Sydney,

Substituting a parameterized stored procedure for the dynamic SQL generated by CreateSourceQuery() isn't likely to work.

However, retrieving the associated entities with a stored procedure and then attaching them to a stub probably would work. Danny Simmons' post referenced in this article has more details on attaching associated entities to stubs.

--rj

marcosb said...

So this article is old as bajesus, but it still seems to be the case that you can't do filtering of child collections.
Is that the case? I don't understand why the entity framework still doesn't have the equivalent of Linq to SQL's AssociateWith()