Wednesday, November 21, 2007

Serializing Object Graphs Without and With References

Updated: 11/25/2007 and 12/12/2007 (Added C# class definitions, plus C# and VB object initialization code. See below.)

Windows Communication Foundation (WCF) defaults to the DataContractSerializer (DCS) for serializing services' object graph payloads to well-formed XML Infosets. DCS is capable of serializing 1:many associations of an object graph, such as Customers -> Orders -> OrderDetails, into a hierarchical XML Infoset, which has a structure that any WS-* SOAP client should be able to deserialize correctly.

An alternative to DCS is the NetDataContractSerializer (NDCS), which can serialize graphs with 1:many and many:1 (bidirectional) associations by preserving association references in ID/IDREF pairs of Id and Ref values. NDCS delivers graph fidelity at the expense of interoperability because it requires clients to support CLR types. A factor in lack of interoperability with other vendors' Web service toolkits is the complexity of schemas genrated by NDCS and Microsoft's patent that appears to cover NDCS's method of serializing/deserializing objects.

Note: The Entity Framework (EF) maps associations to Navigation Properties.

The online help topics for DCS and NDCS use trivial classes with two or three members; no help topics that I've seen demonstrate serializing generic lists. To verify that DCS and NDCS behave as expected with object graphs of moderate to high complexity generated from relational databases, I used my LINQ In-Memory Object Generation (LIMOG) utility. LIMOG autogenerates C# and VB classes designed for serialization with DCS and NDCS. LIMOG also creates object initializers to generate customized List<T> or List(Of T) instances for serialization and deserialization. The three default data sources for classes are Northwind, AdventureWorks, and AdventureWorksLT (Lite) sample databases. Click here to download the AdventureWorks (OLTP)'s entity-relationship diagram in HTML or Visio format; click here to open an HTML entity-relationship diagram of AdventureWorksLT.

Note: The original impetus for creating the LIMOG app was to generate mock objects that could substitute for entities generated by LINQ to SQL and the Entity Framework and speed unit testing. Using mock objects also frees the unit test from infrastructure dependency, which conforms to the separations of concerns principal of software design. Serializing files generated by LIMOG can provide provide payload size and complexity estimates for WS-* Web services.

Here's a screen capture of the LIMOG utility after generating the VB code for the Order class and a List(Of Order) instance with 100 members (click for full-size image):

Note: Marking the Cyclic References check box adds the [DataMember] attribute to members representing many:1 associations.

Serializing Partial Object Graphs with DCS

DCS uses opt-in serialization; DCS serializes classes decorated with a [DataContract] attribute and properties with a [DataMember] attribute only. These two attributes have properties that let you customize the resulting document, such as specifying object and member names and member sequences. (DCS sequences properties alphabetically by default.)

By default, DCS doesn't preserve references by default so it creates duplicates of objects that normally would share a common pointer to the same object in memory (i.e., ReferenceEquals = True). This means that the default DCS implementation can't handle cyclic references (cycles) created by bidirectional 1:many and many:1 relationships. I raised this issue in my LINQ to SQL and Entity Framework XML Serialization Issues with WCF - Part 1 post of October 9, 2007.

DCS offers a PreserveObjectReferences property to solve the preceding problem. However, there are no commonly observed standards for serializing and deserializing the ID/IDREF attributes in XML Infosets required to handle bidirectional references, so Microsoft's XML team purposely made it difficult to set DCS's PreserveObjectReferences property to true.

LINQ to SQL handles this problem by not applying the <DataMember()> attribute to many:1 properties, such as Order.Customer, Order.Employee, Order.Shipper, Order_Detail.Order or Order_Detail.Product. Thus the serializer doesn't follow the paths necessary to include the Products, Categories, Suppliers, Shippers, and Employees classes in the resulting document, which has this structure:

<ArrayOfCustomer xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <Customer>
    ...
    <Orders>
      <Order>
        ...
        <Order_Details>
          </Order_Detail>
          ...
          </Order_Detail>
          ...
        </Order_Details>
      </Order>
      ...
    </Orders>
  ...
  </Customer>
</ArrayOfCustomer>

VB code to serialize the CustomerList tree is:

Dim fs As FileStream = New FileStream("NwindGraphDCS.xml", FileMode.Create)
Dim dcs = New DataContractSerializer(GetType(List(Of Customer)))
dcs.WriteObject(fs, CustomerList)
fs.Close()

and to deserialize it is:

Dim fs As FileStream = New FileStream("NwindGraphDCS.xml", FileMode.Open, _
    FileAccess.Read)
Dim dcs = New DataContractSerializer(GetType(List(Of Customer)))
CustomerList = CType(dcs.ReadObject(fs), List(Of Customer))
fs.Close()

Click here to open a 39.1 kB text file with autogenerated VB code for a set of Northwind classes suitable for serialization with DCS. (The class code is autogenerated from database metadata by a custom VB project, not by LINQ to SQL.) Notice that the [DataMember] attribute is missing from the last one to three members of several classes.

Added 12/13/2007: The C# version of the DCS-formatted class is here and NCDS-formatted class is here.

Click here to open the 125-KB, 5,012-line XML file generated from 100 Order and related Customer and Order_Detail objects by a test harness that executes the preceding functions.

Added 12/13/2007: The C# version of the three functions that generate the three-tier object graph are here and the VB version is here.

Click here to open its 69-line, easily read XSD schema generated by Visual Studio 2008's schema inference engine. (Schema files are renamed with an xml extension so they open in IE.)

EF Beta 2 doesn't serialize many:1 or 1:many associations. It's up to the programmer to write code to download EntityCollections<T>, which correspond to LINQ to SQL's EntitySets. If the client application depends upon many:1 associations, for example to replace surrogate (usually numeric) foreign key values with readable names, it must request the missing EntityReference. EF hides foreign key values whether or not they are surrogate keys, which makes reconstituting many:1 associations unnecessarily difficult. 

In either case, you must regenerate the associations with code. The entity classes contain a single read-only property, MemberName As EntityCollection(Of EntityType) for 1:many associations. Many:1 associations generate a read-write MemberName As EntityType property and a MemberName As EntityReference(Of EntityType).

All three properties are decorated with <XmlIgnoreAttribute(),  SoapIgnoreAttribute()> attributes and lack <DataContract()> or <DataMember()> attributes, so the WCF doesn't searialize them. Therefore, SOAP clients don't receive the information required for a SOAP request to obtain the entities.

LINQ to Objects simplifies writing the code for client-side reconstructions of associations. However, the code is very "chatty" and thus isn't well suited to service-oriented architecture.

Serializing Complete Object Graphs with NDCS

XML Infosets are well-suited to representing acyclic tree data structures, such as the preceding Customer -> Order -> Order_Detail example, but have serious problems with representing edge-labeled directed graphs that result from the cyclic references (cycles) introduced by adding many:1 relationships. Cyclic references lead to endless loops and an XML document of infinite size. NDCS handles serialization of graphs with cycles by assigning an integer ID value (Id attribute) to each unique element when it's first encountered and then substituting a REF[erence] (Ref attribute) as a pointer to the element when it's encountered again.

Code to serialize the CustomerList graph with NDCS is similar to that for DCS except that DCS specifies the generic class name in the constructor while NDCS's XML document contains the CLR type declaration (System.Collections.Generic.List`1[[NwindObjects.MainForm+Customer, NwindObjects, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]]). Here's the NDCS serialization code:

Dim fs As FileStream = New FileStream("NwindGraphDCS.xml", FileMode.Create)
Dim ndcs = New NetDataContractSerializer()
ndcs.Serialize(fs, CustomerList)
fs.Close()

Deserialization code also is similar to that for DCS:

Dim fs As FileStream = New FileStream("NwindGraphDCS.xml", FileMode.Open, _
    FileAccess.Read)
Dim ndcs = New NetDataContractSerializer()
CustomerList = CType(ndcs.Deserialize(fs), List(Of Customer))
fs.Close()

It's easier to view the serialized XML Infoset than to describe its structure, so click here to open the 410-kB, 15,299-line XML file with all 2,626 references preserved. The autogenerated 39.5 kB text file for classes designed for serialization with NDCS is here.

The primary and imported XML schemas inferred for the XML document are available here (203 KB, 7,136 lines) and here (600 bytes). Whitespace added by VS 2008 contributes 90% of the primary schema's original 2 MB file size as inferred; whitespace has been removed from the linked version.

Note: This is a very large schema for a SOAP WSDL file and it uses the xs:import feature that some Web service clients don't support in WSDL files.

Using NDCS for WCS Serialization

Unfortunately, it's not so simple to substitute NDCS for WCF's default DCS on either the service or client component. The Indigo team deliberately obfuscated the substitution process by requiring additional user-written code to enter the serialization and deserialization pipelines, remove the DataContractSerializerOperationBehavior objects from and create and add NetDataContractSerializerOperationBehavior objects to each Operation (see Configuring and Extending the Runtime With Behaviors and WCF's NetDataContractSerializer by Aaron Skonnard.) Autogenerated service proxies also have problems with the service's CLR type declaration for generics (shown above).

Microsoft's party line appears to be: If we can't serialize LINQ to SQL or EF objects in an interoperable manner we'll enforce unidirectional serialization by not serializing many:1 associations (LINQ to SQL) or not serializing any associations (EF). This policy is likely to result in EF sharing LINQ to SQL's failure to provide an "out-of-the-box multi-tier story."

Note: NCDS appears to serialize NHibernate's Bag objects, which provide custom collection types that implement IList<T>. Tim Scott and Greg Banister's Remoting Using WCF and NHibernate post of May 9, 2007 describes a unit test that serializes an NHibernate.Collection.Bag MemoryStream and deserializes it back to the same type. However, this test, which is the in-memory analog of the preceding code snippets, doesn't duplicate WCF serialization. There is nothing in the post that suggests that Bags contain cyclic references. (See Frans Bouma's comment of May 28, 2007 to this post.)

Note: The preceding section was added on November 26, 2007

Probable Patent Coverage of NDCS Serialization

The process by which the NDCS converts a cyclic edge-labeled directed graph to an acyclic edge-labeled directed tree appears to be covered by US Patent 20040239674 "Modeling graphs as XML information sets and describing graphs with XML schema" published on December 2, 2004 and assigned to Microsoft Corp. The inventors are luminaries of Microsoft's early XML Web services efforts: Tim Ewald, Don Box, Keith Ballinger, Martin Gudgin and Stefan Pharies. Here's the abstract of the patent:

Systems and methods for modeling graphs as XML information sets and describing them with XML schema. An edge labeled directed graph is converted to an edge labeled directed tree representing some of the edges directly and some of the edges indirectly. A graph is completely traversed such that all nodes are visited and all edges are traversed. Nodes are included by value initially and then by reference. A schema is provided that describes the structure of an XML tree tha[t] contains graph data.

The patent provides a detailed description of the serialization process with a simple XML Infoset example.

Obviously, intellectual property issues cloud interoperability of services that use NDCS serialization.

Additional References:

0 comments: