Thursday, October 04, 2007

LINQ and Entity Framework Posts for 10/2/2007+

Dino Esposito Discusses Ordered Sequences in LINQ and Relational Sets

Dino Esposito, one of the most respected authorities for AJAX in ASP.NET 3.5, writes in his LINQ Syntax and Linq-to-SQL post of October 4, 2007:

[I]t is key to note that LINQ operators are defined against an ordered sequence of elements. The T-SQL language (as well as any other SQL-based language) works instead with unordered sets of values.

It's my understanding from The .NET Standard Query Operators that:

The Standard Query Operators operate on sequences. Any object that implements the interface IEnumerable<T> for some type T is considered a sequence of that type.

which doesn't imply ordering. The preceding white paper goes on to say:

Operators in the OrderBy/ThenBy family of operators order a sequence according to one or more keys

to produce one of eight variations on the OrderedSequence<TSource> type.

The LINQ: .NET Language-Integrated Query white paper goes on to say:

To allow multiple sort criteria, both OrderBy and OrderByDescending return OrderedSequence<T> rather than the generic IEnumerable<T>. Two operators are defined only on OrderedSequence<T>, namely ThenBy and ThenByDescending which apply an additional (subordinate) sort criterion. ThenBy/ThenByDescending themselves return OrderedSequence<T>, allowing any number of ThenBy/ThenByDescending operators to be applied.

I'd say although LINQ (and thus LINQ to SQL) process collections sequentially, it seems to me that there's no guarantee of sequential order without applying OrderBy/ThenBy operators to generate an OrderedSequence<TSource> collection.

Added: 10/4/2007

Frans Bouma Takes on .NET Source Code, Deferred Execution and Consuming Expression Trees

Swimming against the congratulatory tide surrounding future access to Microsoft's .NET Framework source code, Frans warns: Don't look at the sourcecode of .NET licensed under the 'Reference license' in his October 4, 2007 post. He's building a commercial software product so he must be very reticent about examining others' source code. He also mentions that "Using [Lutz Roeder's Reflector] is technically breaking the EULA." However, I think the issue relates more to copyright statutes than patent law. Software patents (ugh!) apply whether you look at the source code or not.

Update: 10/7/2007: Mono's Miguel de Icaza appears to agree with Frans' position that developers of potentially competiting products should avoid reviewing the source code. He says this about Mono contributors in his Microsoft Opens up the .NET Class Libraries Source Code of October 3, 2007:

But like Rotor, the license under which this code is released is not open-source. People that are interested in continuing to contribute to Mono, or that are considering contributing to Mono's open source implementation of those class libraries should not look at this upcoming source code release.

In Deferred execution in Linq pitfall(s) of October 3, 2007, Frans finds strangeness in local literals of private methods being accessible when deferred execution occurs in the caller. I find that strange, too.

The "Creating LINQ Provider for LLBLGen Pro" continues with October 3, 2007's Developing Linq to LLBLGen Pro, Day 5, which delves into the trials and tribulations of consuming expression trees and non-deterministic expression trees and the need for special-casing as a workaround. He justifiably laments the impossibility to prove code with special-casing is "correct."

Tip: Be sure to read the comments in all three posts.

Kirupa Chinnathambi Compares LINQ and E4X

Kirupa's October 3, 2007 Anders Hejlsberg’s "A Lap Around LINQ" + What is LINQ? post provides a brief introduction to LINQ and a comparison of LINQ with the ECMAScript for XML (E4X) language in Action Script 3. He says:

The main difference between E4X and LINQ is that E4X is limited primarily to XML, and the syntax is, in my view, less readable than LINQ's SQL-like syntax. E4X is great for accessing or creating data and filtering the data you access by attributes or any combination of query-like conditions you set. What sets LINQ apart though, is that it not only allows you to access your data, but it also allows you to manipulate your data in more ways.

Of course, while E4X only deals with XML and LINQ deals with a wider variety of data sources, that isn't a fair comparison to make since E4X, short for ECMAScript for XML, is designed (as its name implies) only for XML.

Added: 10/4/2007

Chris Buckett Continues His Quest for a Mock Entity Framework Provider

His Update on developing a “mock” entity framework provider post of October 3, 2007 outlines the issue he faces in emulating in memory the database side of unit tests for Entity Framework code. He concludes:

Once I have a working sample project, I’ll probably chuck it up on codeplex (if David Sceppa’s happy for me to, as it will use a large part of the sample entity framework code - I’ll have to check).

I'm sure David Sceppa won't object.

Mike Taulty on Deleting Entities and Cascading Deletions

Mike analyzes in his LINQ to Entities - Deleting post of October 3, 2007 some mysteries of the Entity Framework's entity deletion techniques:

  • You must use a LINQ query to return the objects you want to delete. (As with Entity SQL, there's no DML syntax, such as an ObjectContext.DeleteEntities() method.)
  • You must load dependent (order and line item) entities you want to delete when deleting a parent (customer) entity. If you don't want to load the entities to delete, you must enable cascading deletions in the database. 
  • If you enable cascading deletions in the database, you must also add <OnDelete Action="Cascade"> to the <End Role="Customer"></End> node of the Conceptual Model's <Association> element.

LINQ to SQL also exhibits strangeness with cascading deletions, as I noted in my Cascade Deletion Problem with LINQ to SQL Beta 2 of September 5, 2007.

Julie Lerman Proposes a Remote Facade Pattern for Serializing Entities

Julie's October 2, 2007 XML Serializing Entity Framework entities with their children for SOA post describes a technique that's similar to Martin Fowler's Remote Facade pattern, which "[p]rovides a coarse-grained facade on fine-grained objects to improve efficiency over a network." As Fowler observes:

[A]ny object that's intended to be used as a remote objects needs a coarse-grained interface that minimizes the number of calls needed to get some-thing done. Not only does this affect your method calls, it also affects your objects. Rather than ask for an order and its order lines individually, you need to access and update the order and order lines in a single call. This affects your entire object structure. You give up the clear intention and fine-grained control you get with small objects and small methods. Programming becomes more difficult and your productivity slows.

Julie creates a coarse-grained Remote Facade structure based on nested List<T> generic collections. She concludes:

The OrderwithDetails class that implements the facade will need these features:

  1. A property to contain List(Of Order)
  2. A property to contain List(Of OrderDetails)
    1. If we are going deeper then this property should be List(Of OrderDetailswithChildren)
  3. Instantiation code needs to take in the original object and construct the lists
  4. A method that knows how to deserialize the object and return the real object, not the lists of it's parts
  5. If this is going to be used in a [WCF] service, the class needs to be marked as a DataContract and the properties should be marked as DataMembers.

Forcing developers to jump through these kinds of hoops to use the Entity Framework (EF) in a service-oriented environment indicates to me a strange lack of foresight by the EF architects.

This is undoubtedly the reason (or at least one of the reasons) that we haven't seen the definition of "support for n-tier architectures and ... non-trivial, loosely-coupled sample projects with WCF" that I requested from Mike Pizzo in item #10 of my Defining the Direction of LINQ to Entities/EDM post of May 29, 2007. (See the "Julie Lerman Demonstrates Detached Updates with Entity Framework" topic in LINQ and Entity Framework Posts for 9/28/2007+).

Luke Hoban Gets the Most Complex LINQ Query Award (and PLINQ Compatibility)

Luke Hoban, a program manager on the C# compiler team, is a fan of ray-tracing graphics programing. He wrote a C++ ray-tracer in high school and ported the app to Scheme in college. When he joined the C# team three years ago, he reported the (C++, I assume) app to C#, and recently the updated it to C# 3.0 (see his A Ray Tracer in C#3.0 post of April 3, 2007).

The C# 3.0 version had LINQ islands in it but his current version, which he reproduces in his Taking LINQ to Objects to Extremes: A fully LINQified RayTracer post of October 1, 2007, "captures the complete raytracing algorithm" in a single, 60-line LINQ to Objects query. The source code is available for downloading.

The obvious advantage of moving from "conventional" C# 3.0 constructs to a single LINQ query is Parallel LINQ (PLINQ, which is part of the ParallelFX or PFX program) compatibility for simplified parallel processing on multi-core machines. There's a substantial Microsoft team at work on PLINQ and the October 2007 issue of MSDN Magazine carries a "Running Queries On Multi-Core Processors" story about PLINQ by Joe Duffy and Ed Essey.

Joe plans to release a PFX CTP in 2007. Watch his blog for details.

Daan Leijen and Judd Hall describe the more complex forthcoming Task Parallel Library (TPL) in an "Optimize Managed Code For Multi-Core Machines" article in the same issue. The C# 3.0 ray-tracer version upgraded with TPL will be an sample application in the PFX CTP.

It will be interesting to compare how the performance of the TPL and PLINQ samples compares with their predecessors on dual and quad-core machines.

My PLINQ Gathers Momentum post of December 8, 2006 (last updated 7/30/2007) includes many links to early PLINQ articles.