Sunday, December 30, 2007

JSON vs. XML DataContract Serialization: Download Test Harness

Rick Strahl's DataContractJsonSerializer in .NET 3.5 post of December 29, 2007 describes the .NET Framework 3.5's new DataContractJsonSerializer (DCJS) class from the System.Runtime.Serialization.Json.DataContractJsonSerializer namespace, which contains the serializer and related objects for the lightweight JavaScript Object Notation (JSON) transport (RFC 4627). JSON is one of ADO.NET Data Service's two wire formats, as noted in the "Julie Lerman: Astoria to Become ADO.NET Data Services" topic of Link and Entity Framework Posts for 12/10/2007+. (The Astoria Team intends to add plain old XML (POX) format to v1.0 by RTW.)

Quick Summary: JSON serialization with DCJS requires about 25% to about 40% fewer bytes than the XML DataContractSerializer (DCS) class to represent moderately complex objects, such as Northwind Order and their their associated Order_Detail objects, and takes about 25% longer than the DCS class to serialize and deserialize.

However, Windows Communication Foundation (WCF) serializes JSON streams to an internal XML Infoset wire format that requires about 50% more bytes than DCS to hydrate an object across a process boundary. Adding type="JSONDataType" attributes to each member's element contributes the additional bytes.

The added time to serialize a moderate-sized object collection and the dramatic increase in the size of WCF messages indicates that JSON serialization isn't as lightweight as one would expect from its simple syntax.

You can download the VB 9.0 code for the JsonSerializationWinVB.sln test harness project  here. The project requires Visual Basic 2008 Express or Visual Studio 2008 Standard or higher but doesn't need a database connection. (Mock generic data objects are provided.)

The Test Harness

Rick's sample code uses simple, lightweight objects and I wanted to verify whether the DCJS class actually was lighter in weight and as performant or better than the .NET Fx 3.0's DataContractSerializer (DCS) class with more complex classes containing a variety of data types, including Nullable<T>, and generic collections. So I created the following two classes with the LINQ In-Memory Object Generation (LIMOG) utility that's described in my Serializing Object Graphs Without and With References post of November 21, 2007.

Here are C# versions of the class definitions for the Order and Order_Detail objects used in the tests:

[DataContract(Name = "Order", Namespace = "")] 
public class Order { 
    [DataMember(Name = "OrderID", Order = 1)] 
    public int OrderID { get; set; } 
    [DataMember(Name = "CustomerID", Order = 2)] 
    public string CustomerID { get; set; } 
    [DataMember(Name = "EmployeeID", Order = 3)] 
    public int? EmployeeID { get; set; } 
    [DataMember(Name = "OrderDate", Order = 4)] 
    public DateTime? OrderDate { get; set; } 
    [DataMember(Name = "RequiredDate", Order = 5)] 
    public DateTime? RequiredDate { get; set; } 
    [DataMember(Name = "ShippedDate", Order = 6)] 
    public DateTime? ShippedDate { get; set; } 
    [DataMember(Name = "ShipVia", Order = 7)] 
    public int? ShipVia { get; set; } 
    [DataMember(Name = "Freight", Order = 8)] 
    public decimal? Freight { get; set; } 
    [DataMember(Name = "ShipName", Order = 9)] 
    public string ShipName { get; set; } 
    [DataMember(Name = "ShipAddress", Order = 10)] 
    public string ShipAddress { get; set; } 
    [DataMember(Name = "ShipCity", Order = 11)] 
    public string ShipCity { get; set; } 
    [DataMember(Name = "ShipRegion", Order = 12)] 
    public string ShipRegion { get; set; } 
    [DataMember(Name = "ShipPostalCode", Order = 13)] 
    public string ShipPostalCode { get; set; } 
    [DataMember(Name = "ShipCountry", Order = 14)] 
    public string ShipCountry { get; set; } 
    [DataMember(Name = "Order_Details", Order = 15)] 
    public List<Order_Detail> Order_Details { get; set; } 
} 
[DataContract(Name = "Order_Detail", Namespace = "")]
public class Order_Detail {
    [DataMember(Name = "OrderID", Order = 1)]
    public int OrderID { get; set; }
    [DataMember(Name = "ProductID", Order = 2)]
    public int ProductID { get; set; }
    [DataMember(Name = "UnitPrice", Order = 3)]
    public decimal UnitPrice { get; set; }
    [DataMember(Name = "Quantity", Order = 4)]
    public short Quantity { get; set; }
    [DataMember(Name = "Discount", Order = 5)]
    public float Discount { get; set; }
}

LIMOG adds the Name, Namespace, and Order attribute/value pairs; DCJS processes Name and Order attribute values and ignores Namespace values. I wrote custom SQL commands to generate object initializers for Northwind's last 20 Orders rows with their associated Order Details and Customer rows. A few of the orders are missing Nullable<DateTime> ShippedDate values.

Following are the test harness's four operating modes. Click the image for a full-size capture.

1. JSON (DCJS) Serialization with Parent (Order) Objects Only 

Representative deserialized values appear below the column headers.

2. JSON (DCJS) Serialization with Parent and Child  (Order_Detail) Objects

LINQ queries populate the 1:many Order_Details associations.

DateTime values serialize as "/Date(MillsecondsSince1970-01-01T00:00:00.000 +/- HHMM)/" strings, where HHMM is the offset from GMT in hours and minutes. (RFC 4627 doesn't specify a serialization format for date/time values, which has resulted in free-form encoding by various implementers. Stand-Alone JSON Serialization covers the current .NET implementation for ASP.NET AJAX services created in WCF.)

3. XML (DCS) Serialization with Parent (Order) Objects Only

4. XML (DCS) Serialization with Parent and Child (Order_Detail) Objects

XML Infoset Wire Format for WCF Implementations: Parent (Order) Objects

WCF implementations serialize JSON streams to a "internal" XML Infoset format defined by the Mapping Between JSON and XML white paper. The test harness uses the JsonReaderWriterFactory's CreateJsonReader method to instantiate an XmlDictionaryReader object whose ReadOuterXml method delivers the wire format as a string.

The semi-formatted text of the Infoset follows the report of its length below the deserialized data:

Note: I could find no documentation for the JsonReaderWriterFactory's CreateJsonReader method or creating the required XmlDictionaryReader object; I believe that this project contains the only publicly available example. (Search Google for CreateJsonReader.) Here's an excerpt from the test harness code that's based in part on Rick Strahl's example:

Dim ser As DataContractJsonSerializer = _ 
   New DataContractJsonSerializer(GetType(List(Of Order))) 
Dim ms1 As MemoryStream = New MemoryStream() 
ser.WriteObject(ms1, lstOrders) 
Dim json As String = Encoding.UTF8.GetString(ms1.ToArray()) 
txtBytes.Text = json.Length().ToString() 
ms1.Close() 
Dim ms2 As MemoryStream = New MemoryStream(Encoding.UTF8.GetBytes(json)) 
' Deserialize into generic List 
ser = New DataContractJsonSerializer(GetType(List(Of Order))) 
Dim jsonOrders As List(Of Order) = _ 
    TryCast(ser.ReadObject(ms2), List(Of Order)) 
' Serialize JSON MemoryStream to WCS XML wire format 
Dim xdrJson As XmlDictionaryReader = _ 
    JsonReaderWriterFactory.CreateJsonReader(ms2, _ 
    XmlDictionaryReaderQuotas.Max) 
xdrJson.Read() 
Dim xml As String = xdrJson.ReadOuterXml() 
xdrJson.Close() 
ms2.Close() 

XML Infoset Wire Format for WCF Implementations: Parent and Child (Order_Detail) Objects

As Rick mentioned in a comment to his post:

@Roger - awesome work checking out the perf considerations. I suppose it's to be expected that JSON is slower since the parsing of the JSON objects is necessarily a bit more complex than parsing the more XML so I'm not terribly surprised that it's slower than XML.

OTOH, you're not likely to use this on two way WCF communications, but only for AJAX/REST scenarios coming from a browser most likely and in that scenario there's not much choice since JSON is so much easier to use on the client than XML.

Atom10FeedFormatter<TSyndicationFeed> and Atom10ItemFormatter<TSyndicationItem> generic objects (for classes derived from SyndicationFeed) are likely to be used for two-way WCF communications with ADO.NET data services. Magnus Mårtensson shows you in his Create your own Syndication Feeds with .NET Framework 3.5 post of November 22, 2007 how to use these two objects (and their RSS 2.0 equivalents) to create and serialize SyndicationFeeds. Guy Burstein offers a similar demonstration in his How To: Create a RSS Syndication Service with WCF post of December 3, 2007.

Added: 1/1/2008

Size of Serialized Messages and Serialize/Deserialize Execution Times

Following is a summary table of the data collected for the four operating modes. Message sizes don't include whitespace. Execution times are the average of five tests conducted by reopening the form for each execution (to eliminate the effects of data caching.)

    Parent     Child  
Serializer Time, s. Size, Bytes XML, Bytes Time, s. Size, Bytes XML, Bytes
JDCS 0.416 7,559 14,942 0.467 13,253 31,253
DCS 0.352 9,955 N/A 0.412 21,444 N/A

The preceding data was for a Gateway S-5200D with a dual-core Pentium 2.6 GHz processor running Windows Vista Premium as a virtual client with 1GB RAM assigned under a Window 2005 R2 Virtual Server.

Download the VB 9.0 source code for the JsonSerializationWinVB.sln test harness project  here and give it a try on your development machine.

Updated 12/31/2007: Minor edits and additions

0 comments: