Friday, March 09, 2007

Streaming with LINQ to XML Series

The XML Team has started a multipart blog series on the topic of Streaming with LINQ to XML and the XStreamingElement class. The objective of streaming is to enable forward-only processing of a large input or output data source, such as log files or serialized relational data, in order to avoid loading an entire document into memory. The The XmlDataReader and XmlDataWriter pull parsers currently provide the .NET Framework's streaming implementation.

Streaming with LINQ to XML - Part I describes the XML Team's efforts to develop a streaming API for LINQ to XML:

The LINQ to XML design team has spent a lot of effort over the last year or so wrestling with alternative ways to provide streaming functionality in a way that is consistent with the API's overall philosophy. The May 2006 CTP included support for streaming output (but not input) via an XStreamingElement class that essentially allowed XML-like trees of IEnumerable<T> instances that could be lazily consumed while being saved as XML text. This was removed from later CTPs to "keep the slate clean" while incompatible options were considered.

Following is the (abridged) outcome of the team's design deliberations:

After literally months of discussion and prototyping, we decided to address these requirements by a) putting XStreamingElement back in the supported API as it was in the May CTP; b) to NOT push any streaming input API into Orcas RTM, but to release one or more implementations of the ideas we’ve discussed as code samples that can be implemented on top of the public API.

We'll present some pretty straightforward code to handle the most basic use cases such as huge logfiles with a very flat XML structure. Maybe some of these examples can migrate into a "LINQ to XML Power Tools" library of some sort (hint, hint, MVPSs?). Anyway, stay tuned for the details, and please let us know what works for you and what you still need to work with XML easily and efficiently.

Let's hope future code samples include VB 9.0 for a change.

Streaming with LINQ to XML - Part 2 delivers the "code to handle the most basic use cases such as huge logfiles with a very flat XML structure." The team's approach, using a 10-MB Wikipedia abstract.xml file for the example, is:

The key is to write a custom axis method that functions much like the built-in axes such as Elements(), Attributes(), etc. but operates over a specific type of XML data. An axis method typically returns a collection such as IEnumerable. In the example here, we read over the stream with the XmlReader's ReadFrom method, and return the collection by using yield return. This provides the deferred execution semantics necessary to make the custom axis method work well with huge data sources, but allows the application program to use ordinary LINQ to XML classes and methods to filter and transform the results.

(Still no VB code examples.)

Update: 3/25/2007: Added link to part 2, abstract and complaint.

Note: Ralf Lämmel, the "father of LINQ to XSD," is a LINQ to XML streaming enthusiast. (LINQ to XSD is a product of Erik Meijer's Tesla incubator group that's in the early preview stage.) The article cites his "API-based XML Streaming with FLWOR Power and Functional Updates" paper presented at XML 2006.