Monday, January 21, 2008

Coming to the Entity Framework: A Serializable EntityBag for n-Tier Deployment

Danny Simmons explained in his earlier Why are data-centric web services so hard anyway? and So they're hard, but what if I need them... posts of December 20, 2007 that serializing object graphs with associations isn't a walk in the park. The LINQ to SQL team abandoned their efforts to create a serializable "mini connectionless DataContext," which led to Dinesh Kulkarni's LINQ to SQL Has No "Out-of-the-Box Multi-Tier Story" admission on October 15, 2007.

Updated 1/21/2008 and 1/22/2008: Additions/clarifications in [].

Part of the LINQ to SQL team's problem was the desire for Web service interoperability, which (in my opinion, at least) isn't in the cards for object/relational management (O/RM) tools. This observation is especially true for those O/RMs that support [value-based rather than timestamp] change tracking [that's independent of the domain's business objects.]

So I'm glad to see Danny taking the pragmatic approach to Web-service enabling the Entity Framework, as described in his EntityBag Part I – Goals article of January 20, 2008. He says the following about service interoperabilty:

While I like the simplicity of [EntityBag] interaction, it is super important to keep in mind the restrictions imposed by this approach. First off, there’s the fact that this requires us to run .Net and the EF on the client—in fact it requires that the code for your object model be available on the client, so it is certainly not interoperable with Java or something like that.

If interoperabilty is the Holy Grail of Web services, why do typed DataSets remain one of the most common objects serialized by .NET Web services? According to Scott Hanselman, Returning DataSets from WebServices is the Spawn of Satan and Represents All That Is Truly Evil in the World. (Scott posted this diatribe on June 1, 2004, three years before he joined Microsoft in July 2007.) An obvious answer is "because you can."

Another issue is lack of adherence to (or support for) the Web service contract's terms and conditions:

Secondly, because we are sending back and forth the entire ObjectContext, the interface of the web methods imposes no real contract on the kind of data that will travel over the wire. The retrieval method in our example is called GetRoomAndExits, but there’s absolutely no guarantee that the method might not return additional data or even that it will return a room and exits at all. This is even scarier for the update method where the client sends back an EntityBag which can contain any arbitrary set of changes and they are just blindly persisted to the database.

The lack of a service contract or its enforcement doesn't appear to deter the crowd supporting RESTful Web services, including ADO.NET Data Services.

To qualify as an enterprise-level O/RM tool, the Entity Framework must provide out-of-the-box n-tier support for WCF Web services and remoting, even if it's an add-in to v1.0. Save interoperability and contract concerns for v.2.0+.

I look forward to the promised "future posts" in which "we can dig into the implementation of EntityBag."

Backstory: I discussed problems with serializing object graphs that contain cyclic references created by combinations of EntitySet and EntityRef(erence) associations in my Controlling the Depth and Order of EntitySets for 1:Many Associations post of December 20, 2007 (updated 12/23/2007), Serializing Object Graphs Without and With References of November 21, 2007 (updated 12/12/2007), and Serializing Cyclic LINQ to SQL References with WCF of October 30, 2007).

Update 1/21/2008: Frans Bouma's comment of 1/21/2008 takes Microsoft and me to task: Microsoft for not incorporating change data in the business object itself and me for not being precise regarding the difficulty of managing value-based concurrency conflicts without changing the business object's structure. Both the Entity Framework (EF) and LINQ to SQL minimize encroaching on "persistence ignorance" and plain old CLR objects (POCO). See Danny's EF Persistence Ignorance Recap of September 26, 2007 and my Persistence Ignorance Is Bliss, but Is It Missing from the Entity Framework? article of March 14, 2007 (updated 4/24/2007 and earlier).

Update 1/22/2008: In answer to a comment about EntityBag Part I – Goals of January 20, 2008, Danny says on the same day:

The EF won't have something like this built-in for v1, but we are looking hard at the topic for future releases.  I'm not 100% certain we'll do this, since there are some serious issues when it comes to interoperability, etc. (as I've noted).  I believe we made some major mistakes with the DataSet in this regard, and I don't want to repeat them.


Anonymous said...

"which (in my opinion, at least) isn't in the cards for object/relational management (O/RM) tools. This observation is especially true for those O/RMs that support change tracking."
Every O/R mapper performs change tracking one way or the other.

It's not that hard to write XML input/output code for object graphs with change tracking. Yes, it does require some additional elements in the end XML, but that's the price to pay for having graphs send over the wire. I think it took me 2 days or so to write the core code.

So it's a bit weird that MS can't come up with this code themselves, as it's not that hard. Basicly what you do is implement IXmlSerializable and write the entity's attribute values as: (<> replaced with [] )
[... change info stuff here]

Very simple. Non-.NET code can consume the XML as-is and can skip any '_' prefixed elements for example. Deserializing it into a graph WITH change tracking is also easy: again implement IXmlSerializable and restore the graph from the XML and you're done. Our code can do that very fast with xmlreader/writer instances, and the change tracking info is packed into bitarrays and small elements, so it hardly takes any extra space compared to the rest of the XML.

There's indeed one little problem for the people who follow the classic central context system approach: the change tracking info isn't stored inside the entities in these systems.

And this is precisely why it's a flawed approach. Change tracking belongs inside the object, not outside the object. With any other object, one would design it inside the object, why would it now all of a sudden be different? For the precious poco award? :)

Roger Jennings (--rj) said...


See my update re your comment added to the post.

How do you handle cyclic references with your serialization? The only way that I've seen it done with XML is to use the NetDataContractSerializer that MSFT doesn't appear to like (despite having a U.S. Patent on its approach).