Wednesday, January 23, 2008

Namespace Strangenesses in XML Infosets Transformed by LINQ to XML

The purpose of namespace prefixes is to provide abbreviations for global or group-level namespaces, which otherwise would bloat the already substantial overhead of XML Infosets. I've found LINQ to XML not to process namespace declarations as I expected when processing some semi-real-world documents.

Updated 1/23/2008: See end of post.

Bloating All Prefixed Elements with Duplicate Local Namespace Declarations

LINQ to XML works exclusively with expanded namespace prefixes, so relatively simple documents with a few namespaces become unwieldy to store and difficult for humans to read. For example, this simple Atom 1.0-formatted source Infoset returned by an ADO.NET Data Services URL query is quite easy to read:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed xml:base="http://localhost:50539/Northwind.svc/" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb" xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns="http://www.w3.org/2005/Atom">
  <entry adsm:type="NorthwindModel.Orders">
    <id>http://localhost:50539/Northwind.svc/Orders(11077)</id>
    <updated />
    <title />
    <author>
      <name />
    </author>
    <link rel="edit" href="Orders(11077)" title="Orders" />
    <content type="application/xml">
      <ads:OrderID adsm:type="Int32">11077</ads:OrderID>
      <ads:OrderDate adsm:type="Nullable`1[System.DateTime]">2007-05-06T00:00:00</ads:OrderDate>
      <ads:RequiredDate adsm:type="Nullable`1[System.DateTime]">2007-06-03T00:00:00</ads:RequiredDate>
      <ads:ShippedDate adsm:type="Nullable`1[System.DateTime]" ads:null="true" />
      <ads:Freight adsm:type="Nullable`1[System.Decimal]">8.5300</ads:Freight>
      <ads:ShipName>Rattlesnake Canyon Grocery</ads:ShipName>
      <ads:ShipAddress>2817 Milton Dr.</ads:ShipAddress>
      <ads:ShipCity>Albuquerque</ads:ShipCity>
      <ads:ShipRegion>NM</ads:ShipRegion>
      <ads:ShipPostalCode>87110</ads:ShipPostalCode>
      <ads:ShipCountry>USA</ads:ShipCountry>
    </content>
    <link rel="related" title="Customers" href="Orders(11077)/Customers" type="application/atom+xml;type=entry" />
    <link rel="related" title="Employees" href="Orders(11077)/Employees" type="application/atom+xml;type=entry" />
    <link rel="related" title="Order_Details" href="Orders(11077)/Order_Details" type="application/atom+xml;type=feed" />
    <link rel="related" title="Shippers" href="Orders(11077)/Shippers" type="application/atom+xml;type=entry" />
  </entry>
  <!-- ... -->
</feed>
Applying a LINQ to XML query that returns only abbreviated <entry> groups for the USA in reverse OrderDate order results in this mess:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <feed> <entry> <content type="application/xml" xmlns="http://www.w3.org/2005/Atom"> <ads:OrderID adsm:type="Int32" xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb">11077</ads:OrderID> <ads:OrderDate adsm:type="Nullable`1[System.DateTime]" xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb"> 2007-05-06T00:00:00 </ads:OrderDate> <ads:RequiredDate adsm:type="Nullable`1[System.DateTime]" xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb"> 2007-06-03T00:00:00 </ads:RequiredDate> <ads:ShippedDate adsm:type="Nullable`1[System.DateTime]" ads:null="true" xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb" /> <ads:Freight adsm:type="Nullable`1[System.Decimal]" xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb">8.5300</ads:Freight> <ads:ShipName xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb">Rattlesnake Canyon Grocery</ads:ShipName> <ads:ShipAddress xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb">2817 Milton Dr.</ads:ShipAddress> <ads:ShipCity xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb">Albuquerque</ads:ShipCity> <ads:ShipRegion xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb">NM</ads:ShipRegion> <ads:ShipPostalCode xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb">87110</ads:ShipPostalCode> <ads:ShipCountry xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb">USA</ads:ShipCountry> </content> </entry>
<!-- ... --> </feed>

You can remove the duplicate local namespaces by string manipulation but doing so results in brittle code.

Bloating Some Unprefixed Elements with Duplicate Group Namespace Declarations

An alternative is to transform, rather than filter, the document because the compiler is reported to cache the namespaces you add with code and remove them from the output. My Visual Basic 9.0 XML literal transform code is similar to the following abbreviated version (the three namespaces are imported with Imports directives, which aren't shown):

Private Sub TransformOrders()
    Dim xdOrders As XDocument = XDocument.Load(strDataPath & "Orders.xml", _
                                               LoadOptions.PreserveWhitespace)
    Dim xdDetails As XDocument = XDocument.Load(strDataPath & _
                           "Order_Details.xml", LoadOptions.PreserveWhitespace)

    Dim Orders As XDocument = _
    <?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <feed xmlns="http://www.w3.org/2005/Atom"
        xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb"
        xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata"
        <%= From o In xdOrders...<content> _
            Where o...<ads:ShipCountry>.Value = "USA" _
            Order By o...<ads:OrderDate>.Value Descending _
            Select New XElement( _
        <entry>
            <content type="application/xml">
                <Order>
                    <ads:OrderID adsm:type="Int32">
                        <%= o...<ads:OrderID>.Value %>
                    </ads:OrderID>
                    <!-- ... -->
                    <ads:ShipCountry>
                        <%= o...<ads:ShipCountry>.Value %>
                    </ads:ShipCountry>
                </Order>
            </content>
        </entry>)
        %>>
    </feed>
End Sub

However, the compiler doesn't get rid of all duplicate namespaces, as illustrated by the following output Infoset:

<feed xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb" xmlns="http://www.w3.org/2005/Atom">
  <entry xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb" xmlns="http://www.w3.org/2005/Atom">
    <content type="application/xml">
      <Order>
        <ads:OrderID adsm:type="Int32">11006</ads:OrderID>
        <ads:OrderDate adsm:type="Nullable`1[System.DateTime]">2007-04-07T00:00:00</ads:OrderDate>
        <ads:RequiredDate adsm:type="Nullable`1[System.DateTime]">2007-05-05T00:00:00</ads:RequiredDate>
        <ads:ShippedDate adsm:type="Nullable`1[System.DateTime]">2007-04-15T00:00:00</ads:ShippedDate>
        <ads:Freight adsm:type="Nullable`1[System.Decimal]">25.1900</ads:Freight>
        <ads:ShipName>Great Lakes Food Market</ads:ShipName>
        <ads:ShipAddress>2732 Baker Blvd.</ads:ShipAddress>
        <ads:ShipCity>Eugene</ads:ShipCity>
        <ads:ShipRegion>OR</ads:ShipRegion>
        <ads:ShipPostalCode>97403</ads:ShipPostalCode>
        <ads:ShipCountry>USA</ads:ShipCountry>
      </Order>
    </content>
  </entry>
  <!-- ... -->
</feed>

The <entry> element (shown set bold) has a duplicate set of namespace declarations.

Bloating More Unprefixed Elements with Duplicate Group Namespace Declarations

Adding code to insert related Order_Detail elements, as shown below, results in namespace declaration duplication in the those elements.

Private Sub TransformOrderAndOrderDetails()
    Dim xdOrders As XDocument = XDocument.Load(strDataPath & "Orders.xml", _
                                               LoadOptions.PreserveWhitespace)
    Dim xdDetails As XDocument = XDocument.Load(strDataPath & _
                           "Order_Details.xml", LoadOptions.PreserveWhitespace)

    Dim Orders As XDocument = _
    <?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <feed xmlns="http://www.w3.org/2005/Atom"
        xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb"
        xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata"
        <%= From o In xdOrders...<content> _
            Where o...<ads:ShipCountry>.Value = "USA" _
            Order By o...<ads:OrderDate>.Value Descending _
            Select New XElement( _
        <entry>
            <content type="application/xml">
                <Order>
                    <ads:OrderID adsm:type="Int32">
                        <%= o...<ads:OrderID>.Value %>
                    </ads:OrderID>
                    <!-- ... -->
                    <ads:ShipCountry>
                        <%= o...<ads:ShipCountry>.Value %>
                    </ads:ShipCountry>
                    <Order_Details
                    <%= From d In xdDetails...<content> _
                        Where d...<ads:OrderID>.Value = o...<ads:OrderID>.Value _
                        Select New XElement( _
                        <Order_Detail>
                            <ads:OrderID adsm:type="Int32">
                                <%= d...<ads:OrderID>.Value %>
                            </ads:OrderID>
                            <ads:ProductID adsm:type="Int32">
                                <%= d...<ads:ProductID>.Value %>
                            </ads:ProductID>
                            <ads:Quantity adsm:type="Short">
                                <%= d...<ads:Quantity>.Value %>
                            </ads:Quantity>
                            <ads:QuantityPerUnit>
                                <%= p...<ads:QuantityPerUnit>.Value %>
                            </ads:QuantityPerUnit>
                            <ads:UnitPrice adsm:type="Decimal">
                                <%= d...<ads:UnitPrice>.Value %>
                            </ads:UnitPrice>
                            <ads:Discount adsm:type="Single">
                                <%= d...<ads:Discount>.Value %>
                            </ads:Discount>
                        </Order_Detail>) _
                    %>>
                    </Order_Details>
                </Order>
            </content>
        </entry>)
        %>>
    </feed>
End Sub

Each Order_Detail group has its own duplicate namespace declarations:

<feed xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb" xmlns="http://www.w3.org/2005/Atom">
  <entry xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb" xmlns="http://www.w3.org/2005/Atom">
    <content type="application/xml">
      <Order>
        <ads:OrderID adsm:type="Int32">11006</ads:OrderID>
        <ads:OrderDate adsm:type="Nullable`1[System.DateTime]">2007-04-07T00:00:00</ads:OrderDate>
        <ads:RequiredDate adsm:type="Nullable`1[System.DateTime]">2007-05-05T00:00:00</ads:RequiredDate>
        <ads:ShippedDate adsm:type="Nullable`1[System.DateTime]">2007-04-15T00:00:00</ads:ShippedDate>
        <ads:Freight adsm:type="Nullable`1[System.Decimal]">25.1900</ads:Freight>
        <ads:ShipName>Great Lakes Food Market</ads:ShipName>
        <ads:ShipAddress>2732 Baker Blvd.</ads:ShipAddress>
        <ads:ShipCity>Eugene</ads:ShipCity>
        <ads:ShipRegion>OR</ads:ShipRegion>
        <ads:ShipPostalCode>97403</ads:ShipPostalCode>
        <ads:ShipCountry>USA</ads:ShipCountry>
        <Order_Details>
          <Order_Detail xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb" xmlns="http://www.w3.org/2005/Atom">
            <ads:OrderID adsm:type="Int32">11006</ads:OrderID>
            <ads:ProductID adsm:type="Int32">1</ads:ProductID>
            <ads:Quantity adsm:type="Short">8</ads:Quantity>
            <ads:UnitPrice adsm:type="Decimal">18.0000</ads:UnitPrice>
            <ads:Discount adsm:type="Single">0</ads:Discount>
          </Order_Detail>
          <Order_Detail xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb" xmlns="http://www.w3.org/2005/Atom">
            <ads:OrderID adsm:type="Int32">11006</ads:OrderID>
            <ads:ProductID adsm:type="Int32">29</ads:ProductID>
            <ads:Quantity adsm:type="Short">2</ads:Quantity>
            <ads:UnitPrice adsm:type="Decimal">123.7900</ads:UnitPrice>
            <ads:Discount adsm:type="Single">0.25</ads:Discount>
          </Order_Detail>
        </Order_Details>
      </Order>
    </content>
  </entry>
  <!-- ... -->
</feed>

Meaningless Empty Namespace Inserted with Functional Construction

C# 3.0 and VB 9.0 functional construction code doesn't repeat global namespace declarations, but it isn't immune to namespaces strangeness either. This C# code (and its corresponding VB port) adds the empty namespace shown below:

private void btnOrder_DetailsLookup_Click(object sender, EventArgs e)
{
    XDocument xdOrders = XDocument.Load(strDataPath + "Orders.xml", 
                                        LoadOptions.PreserveWhitespace);
    XDocument xdDetails = XDocument.Load(strDataPath + "Order_Details.xml", 
                                         LoadOptions.PreserveWhitespace);

    XNamespace atom = "http://www.w3.org/2005/Atom";
    XNamespace ads = "http://schemas.microsoft.com/ado/2007/08/dataweb";
    XNamespace adsm = "http://schemas.microsoft.com/ado/2007/08/dataweb/metadata";

    XDocument Orders = new XDocument(new XDeclaration("1.0", "utf-8", "yes"), 
        new XElement(atom + "feed", 
            new XAttribute("xmlns", "http://www.w3.org/2005/Atom"), 
            new XAttribute(XNamespace.Xmlns + "ads", 
                    "http://schemas.microsoft.com/ado/2007/08/dataweb"), 
                new XAttribute(XNamespace.Xmlns + "adsm", 
                    "http://schemas.microsoft.com/ado/2007/08/dataweb/metadata"), 
                (from o in xdOrders.Descendants(atom + "content") 
                 where o.Element(ads + "ShipCountry").Value == "USA" 
                 orderby o.Element(ads + "OrderDate").Value descending 
                 select new XElement("entry", 
                     new XElement("content", new XAttribute("type", 
                         "application/xml"), 
                     new XElement("Order", new XElement(ads + "OrderID", 
                         o.Element(ads + "OrderID").Value, 
                         new XAttribute(adsm + "type", "int32")), 
                         <!-- ... -->
                         new XElement(ads + "ShipCountry", 
                             o.Element(ads + "ShipCountry").Value), 
                         new XElement("Order_Details", 
                             from d in xdDetails.Descendants(atom + "content") 
                             where d.Element(ads + "OrderID").Value == 
                                 o.Element(ads +"OrderID").Value 
                             select new XElement("Order_Detail", 
                                 new XElement(ads + "OrderID", 
                                     d.Element(ads + "OrderID").Value, 
                                     new XAttribute(adsm + "type", 
                                         "System.Int32")), 
                                 new XElement(ads + "ProductID", 
                                     d.Element(ads + "ProductID").Value, 
                                     new XAttribute(adsm + "type", 
                                         "System.Int32")), 
                                 new XElement(ads + "Quantity", 
                                     d.Element(ads + "Quantity").Value, 
                                     new XAttribute(adsm + "type", 
                                         "System.Int16")), 
                                 new XElement(ads + "UnitPrice", 
                                     d.Element(ads + "UnitPrice").Value, 
                                     new XAttribute(adsm + "type", 
                                         "System.Decimal")), 
                                 new XElement(ads + "Discount", 
                                     d.Element(ads + "Discount").Value, 
                                     new XAttribute(adsm + "type", 
                                         "System.Single")))))))));
        }

The resulting Infoset's <entry> group has a spurious xmlns="" attribute:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:ads="http://schemas.microsoft.com/ado/2007/08/dataweb" xmlns:adsm="http://schemas.microsoft.com/ado/2007/08/dataweb/metadata">
  <entry xmlns="">
    <content type="application/xml">
      <Order>
        <ads:OrderID adsm:type="int32">11006</ads:OrderID>
        <ads:OrderDate adsm:type="Nullable`1[System.DateTime]">2007-04-07T00:00:00</ads:OrderDate>
        <ads:RequiredDate adsm:type="Nullable`1[System.DateTime]">2007-05-05T00:00:00</ads:RequiredDate>
        <ads:ShippedDate adsm:type="Nullable`1[System.DateTime]">2007-04-15T00:00:00</ads:ShippedDate>
        <ads:Freight adsm:type="Nullable`1[System.Decimal]">25.1900</ads:Freight>
        <ads:ShipName>Great Lakes Food Market</ads:ShipName>
        <ads:ShipAddress>2732 Baker Blvd.</ads:ShipAddress>
        <ads:ShipCity>Eugene</ads:ShipCity>
        <ads:ShipRegion>OR</ads:ShipRegion>
        <ads:ShipPostalCode>97403</ads:ShipPostalCode>
        <ads:ShipCountry>USA</ads:ShipCountry>
        <Order_Details>
          <Order_Detail>
            <ads:OrderID adsm:type="System.Int32">11006</ads:OrderID>
            <ads:ProductID adsm:type="System.Int32">1</ads:ProductID>
            <ads:Quantity adsm:type="System.Int16">8</ads:Quantity>
            <ads:UnitPrice adsm:type="System.Decimal">18.0000</ads:UnitPrice>
            <ads:Discount adsm:type="System.Single">0</ads:Discount>
          </Order_Detail>
          <Order_Detail>
            <ads:OrderID adsm:type="System.Int32">11006</ads:OrderID>
            <ads:ProductID adsm:type="System.Int32">29</ads:ProductID>
            <ads:Quantity adsm:type="System.Int16">2</ads:Quantity>
            <ads:UnitPrice adsm:type="System.Decimal">123.7900</ads:UnitPrice>
            <ads:Discount adsm:type="System.Single">0.25</ads:Discount>
          </Order_Detail>
        </Order_Details>
      </Order>
    </content>
  </entry>
  <!-- ... -->
</feed>

Update 1/23/2008: After further consideration, I've concluded that the spurious xmlns="" (empty namespace) declaration isn't benign. The source document at the top of this post has <entry> and <content> groups without a namespace declaration; these groups are within the Atom namespace. Therefore, there's no justification that I can see for removing them from the Atom namespace with xmlns="".

The xmlns="" attribute probably is benign or even useful because the inferred schema for the source document doesn't include Order, Order_Details, and Order_Detail groups. However, I didn't include the XML document, or reference or infer an XML schema in the C# project because C# doesn't support IntelliSense for functional construction.

VB MVP Bill McCarthy, who has written extensively about LINQ to XML, suggested in his comment to this post:

Last one first: the xmlns="" is required because you have a default namespace at the root of the document, then you have entry, content, Order, Order_Details and Order_Detail all without a namespace (or more correctly the empty namespace. If they were meant to be in the default namespace, and you are using the explicit XElement constructors, you need to supply that namespace. VB makes it easy if you use XML literals.

Bill is correct that you must "supply that namespace." Thanks, Bill.

But the only way I can find to supply it in my sample code is to add an expanded name to each element that precedes an element in a prefixed namespace, as shown in black in the following snippets:

select new XElement("{http://www.w3.org/2005/Atom}entry",
    // Following doesn't solve the xmlns="" issue; it throws a "The prefix '' cannot be redefined 
    // from '' to 'http://www.w3.org/2005/Atom' within the same start element tag.
    //new XAttribute("xmlns", "http://www.w3.org/2005/Atom"), 
    new XElement("{http://www.w3.org/2005/Atom}content", new XAttribute("type", "application/xml"),
    new XElement("{http://www.w3.org/2005/Atom}Order", new XElement(ads + "OrderID", o.Element(ads + 
        "OrderID").Value, new XAttribute(adsm + "type", "int32")), 

new XElement("{http://www.w3.org/2005/Atom}Order_Details", 
    from d in xdDetails.Descendants(atom + "content") 
    where d.Element(ads + "OrderID").Value == 
        o.Element(ads +"OrderID").Value
    select new XElement("{http://www.w3.org/2005/Atom}Order_Detail", 

It's interesting that attempts to add the default namespace declaration with an attribute throw the runtime exception noted in the first comment. I don't recall seeing any documentation about this peculiar expanded name syntax requirement.

Note: For the record, Order, Order_Details and Order_Detail elements aren't in the actual Atom or ads namespace. They were inserted into the document for demonstration purposes.

Bill's comment also includes the following observation about the problem with the VB literal implementation:

As to the second [to] last example, the VB code has a superfluous New XElement that is stuffing you up I think.

I don't think that either New XElement statement is superfluous. The first is required for each Order group and the second is required for each Order_Detail group in the enumeration.

So I don't believe mystery is solved for VB's XML literal syntax.

1 comments:

Bill McCarthy said...

Hi Roger,

Last one first: the xmlns="" is required because you have a default namespace at the root of the document, then you have entry, content, Order, Oder_Details and Order_Detail all without a namespace (or more correctly the empty namespace. IF they were meant to be in the default namespace, and you are using the explicit XElement constructors, you need to supply that namespace. VB makes it easy if you use XML literals.
As to the second last example, the VB code has a superfulous New XElement that is stuffign you up I think.
And as to XML namespaces, and how this works, there's soem stuff in VSM and on my blog that covers it a bit more.