Wednesday, January 09, 2008

LINQ to XML: Grouping and Aggregation Gotchas, Part I

LINQ to XML is a candidate to replace XQuery for grouping elements, transforming Infosets and aggregating numerical data. It also can supplant XPath for navigation and XSLT for grouping and transformation. Aggregating elements or element groups and generating sum, average, minimium and maximum values is one of the most common uses for XQuery, XPath and XSLT.

A primary benefit of LINQ to XML is its more straightforward (at least to SQL aficianados) group ... by ... into syntax compared to XQuery's FLWOR (for, let, where, order by, return) and XPath iterator expressions, and inscrutable XSLT stylesheets.

Updated 1/9/2008: See below

Non-Trivial LINQ to XML Group By and Aggregation Examples Are Lacking

A serious deterrent to converting XQuery and XSLT adherents to LINQ to XML is the relative lack of published LINQ to XML sample queries that demonstrate proper use of grouping expressions, with or without numeric aggregation.

I can't find any grouping examples in the LINQ to XML-related chapters of current LINQ eBooks in my collection: LINQ for Visual VB 2005, Introducing LINQ, Pro LINQ C# 2008, or LINQ in Action.

Searches on "LINQ to XML" group by into and XLinq group by into returned only a few useful posts, such as:

The last two posts also included use of the Sum() function. There also were a few valid hits from MSDN blog posts but none on LINQ to XLM white papers, such as .NET Language-Integrated Query for XML Data of February 2007 (GroupJoin was mentioned without samples), Overview of LINQ to XML in Visual Basic, How to: Perform Grouped Joins (C# Programming Guide), Introduction to LINQ in Visual Basic, etc.

Update 1/9/2008: Julie Lerman explains the benefit of code samples for new software technologies in her LINQ - Learn by example post of January 7, 2008:

I have often likened learning LINQ as well as learning Entity SQL to learning T-SQL. Learning T-SQL has, for me, been an evolution. I still have to look up the Cast & Convert topic 99% of the time in the docs whenever I need to convert a datetime to a desired char format.

Even if there were 1001 examples, I know that there will always be something I want to do that I can't find an example for and will have to rely on gaining a better understanding of LINQ in order to accomplish it.

Errors in LINQ Project Sample Query Explorers Grouping Examples

The C# 3.0 and VB 9.0 LINQ Project Sample Query Explorer projects dated 11/8/2007 that shipped with Visual Studio's RTM bits provide only three examples under the Grouping Node:

  1. Group Orders by Customer (XLinq53): Returns Orders element groups resequenced by customer (not alphabetized)
  2. Group Customers by Country and City (XLinq54):  Returns a directory of ContactNames grouped by Country and City (not alphabetized) with Count of Orders
  3. Group Orders by Customer (XLinq55): Should return a new Infoset of order element groups for each customer that placed  25 (C#) or 15 (VB) or more orders

Despite the few grouping samples provided, both C# and VB Grouping sample 3 (XLinq55) versions fail to execute correctly; they return a single empty <Customers /> node due to improper application of the Element.Value property.

Group Orders By Customer (C#) Sample Query

Following are the original and fixed version of the sample query. The code failed to group the orders by customers because order.Element("CustomerID") returned an anonymous type instead of the string value of the element's content.

The fix simply inverts the application of the Value property, which returns the string value of the concatenated contents of an Element or Attribute:

namespace GroupOrdersByCustomerCS
{
    class Program
    {
        static void Main(string[] args)
        {
            // Public Sub XLinq55() LINQ to XML Sample Query (original version) fails 
            // The problem is that the group order by order.Element("CustomerID")
            // phrase is missing the .Value property. 
            // Key.Value elsewhere needs ".Value" removed.

            XDocument customers = XDocument.Load(dataPath + "nw_customers.xml");
            XDocument orders = XDocument.Load(dataPath + "nw_orders.xml");
            
            XElement custOrder = new XElement("CustomerOrders",
                from
                    order in orders.Descendants("Orders")
                group order by
                    order.Element("CustomerID") into cust_orders
                where
                    cust_orders.Count() > 25
                select
                    new XElement("Customer",
                        new XAttribute("CustomerID", cust_orders.Key.Value),
                        from customer in customers.Descendants("Customers")
                        where (string)customer.Attribute("CustomerID") == 
                            (string)cust_orders.Key.Value
                        select customer.Nodes(), cust_orders));
            
            Console.WriteLine(custOrder);
            Console.WriteLine("Above from original XLinq55 version\r\n");

            // Comment out the Console.WriteLine(custOrderFixed) 
            // statement below to view the preceding result
           
            XElement custOrderFixed = new XElement("CustomerOrders",
                from order in orders.Descendants("Orders")
                // .Value added below
                group order by order.Element("CustomerID").Value into cust_orders
                where cust_orders.Count() > 25
                select  new XElement("Customer",
                        // .Value removed below
                        new XAttribute("CustomerID", cust_orders.Key),
                        from
                            customer in customers.Descendants("Customers")
                        where
                            // .Value removed below
                            (string)customer.Attribute("CustomerID") == 
                            cust_orders.Key
                        select
                            customer.Nodes(), cust_orders));
            
            Console.WriteLine("Above from fixed XLinq55 version");
            Console.WriteLine(custOrderFixed);
            Console.ReadLine();
        }
    }
}

Note: The preceding problem isn't fixed in Charlie Calvert's newly uploaded set of C# samples available for download here.

Group Orders By Customer (VB) Sample Query

The same problem needs the same fix to the VB version, as shown here:

Sub Main()
    ' Public Sub XLinq55() LINQ to XML Sample Query (original version) fails
    ' The problem is that the Group By Key = order.<CustomerID> phrase 
    ' is missing the .Value property
    ' Key.Value elsewhere needs .Value removed
    ' Notice that this sample is missing from MSDN's LINQ To XML Samples - 
    ' Grouping page at 'http://msdn2.microsoft.com/en-us/bb738046.aspx 
    '(There is no C# equivalent to this page)

    Dim customers = XDocument.Load(dataPath + "nw_customers.xml")
    Dim orders = XDocument.Load(dataPath + "nw_orders.xml")

    Dim custOrder = _
        <CustomerOrders>
            <%= From order In orders...<Orders> _
                Group By Key = order.<CustomerID> _
                Into cust_orders = Group _
                Where cust_orders.Count() > 15 _
                Select <Customer CustomerID=<%= Key.Value %>>
                           <%= From customer In customers...<Customers> _
                               Where CStr(customer.@CustomerID) = Key.Value _
                               Select customer.Nodes() %>
                           <%= cust_orders %>
                       </Customer> %>
        </CustomerOrders>

    Console.WriteLine(custOrder)
    Console.WriteLine("Above from original XLinq55 version")
    Console.WriteLine(vbCrLf)

    // Comment out the Console.WriteLine(custOrderFixed) 
    // statement below to view the preceding result 
    Dim custOrderFixed = _
        <CustomerOrders>
            <%= From order In orders...<Orders> _
                Group By Key = order.<CustomerID>.Value _
                Into cust_orders = Group _
                Where cust_orders.Count() > 25 _
                Select <Customer CustomerID=<%= Key %>>
                           <%= From customer In customers...<Customers> _
                               Where CStr(customer.@CustomerID) = Key _
                               Select customer.Nodes() %>
                           <%= cust_orders %>
                       </Customer> %>
        </CustomerOrders>
    Console.WriteLine(custOrderFixed)
    Console.WriteLine("Above from corrected XLinq55 version")
    Console.ReadKey()
End Sub

As noted in the VB version's comments, it's interesting that the authors of the MSDN's LINQ To XML Samples - Grouping page removed the sample from the page rather than fix the problem.

Update 1/9/2008: An alternative to using the .Value function to return strings for implicit typing is enforcing explicit typing by specifying one of the 24 explicit conversion operators for the XElement object. Scott Hanselman's Improving LINQ Code Smell with Explicit and Implicit Conversion Operators of August 30, 2007 explains the syntax and several of my other sample in the forthcoming post use explicit type conversion with numeric aggregates.

The last three Orders groups of the correct XML output from the two examples is:

</CustomerOrders>
    <!-- ... -->
    <Orders>
      <CustomerID>SAVEA</CustomerID>
      <EmployeeID>7</EmployeeID>
      <OrderDate>1998-04-17T00:00:00</OrderDate>
      <RequiredDate>1998-05-15T00:00:00</RequiredDate>
      <ShipInfo ShippedDate="1998-04-27T00:00:00">
        <ShipVia>2</ShipVia>
        <Freight>830.75</Freight>
        <ShipName>Save-a-lot Markets</ShipName>
        <ShipAddress>187 Suffolk Ln.</ShipAddress>
        <ShipCity>Boise</ShipCity>
        <ShipRegion>ID</ShipRegion>
        <ShipPostalCode>83720</ShipPostalCode>
        <ShipCountry>USA</ShipCountry>
      </ShipInfo>
    </Orders>
    <Orders>
      <CustomerID>SAVEA</CustomerID>
      <EmployeeID>6</EmployeeID>
      <OrderDate>1998-04-17T00:00:00</OrderDate>
      <RequiredDate>1998-05-15T00:00:00</RequiredDate>
      <ShipInfo ShippedDate="1998-04-24T00:00:00">
        <ShipVia>2</ShipVia>
        <Freight>227.22</Freight>
        <ShipName>Save-a-lot Markets</ShipName>
        <ShipAddress>187 Suffolk Ln.</ShipAddress>
        <ShipCity>Boise</ShipCity>
        <ShipRegion>ID</ShipRegion>
        <ShipPostalCode>83720</ShipPostalCode>
        <ShipCountry>USA</ShipCountry>
      </ShipInfo>
    </Orders>
    <Orders>
      <CustomerID>SAVEA</CustomerID>
      <EmployeeID>1</EmployeeID>
      <OrderDate>1998-05-01T00:00:00</OrderDate>
      <RequiredDate>1998-05-29T00:00:00</RequiredDate>
      <ShipInfo ShippedDate="1998-05-04T00:00:00">
        <ShipVia>1</ShipVia>
        <Freight>30.09</Freight>
        <ShipName>Save-a-lot Markets</ShipName>
        <ShipAddress>187 Suffolk Ln.</ShipAddress>
        <ShipCity>Boise</ShipCity>
        <ShipRegion>ID</ShipRegion>
        <ShipPostalCode>83720</ShipPostalCode>
        <ShipCountry>USA</ShipCountry>
      </ShipInfo>
    </Orders>
  </Customer>
</CustomerOrders>

Part II will demonstrate use of aggregation operators in a post later this week.

Update 1/8/2008: Minor edits and colorization fixes. Added LINQ-related C# 3.0 and VB 9.0 keywords to the Snippet Editor plug-in Windows Live Writer and recompiled per instructions in Binglong's Extend the Code Snippet Plug-in for Windows Live Writer to support C++ and Java post.

Request: If any has a complete list of new C# 3.0 and VB 9.0 keywords, please leave a comment with it's URL. Thanks.

1 comments:

jHarris4104 said...

Thanks! I'm just learning LINQ/C##/.NET and noticed the bug in the XLinq55 sample.