Sunday, May 20, 2007

LINQ to DataSets Rehabilitates VB's Bang Operator

Visual Basic's bang (!) operator (a.k.a. pling and officially the dictionary lookup operator) seldom appears in VB.NET code. The purpose of the bang operator (alias the bang separator) is to specify members of string-keyed collections, such as fields of an ADO.COM Recordset, as in:

rsCusts!CustomerID = "BOGUS"
rsCusts!CompanyName = "Bogus Software, Inc."
rsCusts!ContactName = "Joe Bogus"

The preceding shaves a few keystrokes from "parens-and-quote" syntax, which relies on Fields being the default property of a Recordset:

rsCusts("CustomerID") = "BOGUS"
rsCusts("CompanyName") = "Bogus Software, Inc."
rsCusts("ContactName") = "Joe Bogus"

which, in turn, is shorthand for the fully-qualified syntax:

rsCusts.Fields("CustomerID") = "BOGUS"
rsCusts.Fields("CompanyName") = "Bogus Software, Inc."
rsCusts.Fields("ContactName") = "Joe Bogus"

One reason for bang's disappearance is that ! got a bum rap from VB.COM and VBA developers because amateur coders (especially Access rookies) used bang in place of dot (.) to specify properties of other objects, such as forms and controls. Doing this kills IntelliSense and compile-time checking. In fact, ! fell into such disrepute that many developers, such as this "VB Expert," believed that VB.NET didn't support the bang operator. Some folks claimed that performance suffered when ! was tokenized to the paren-quote syntax, although I never found anyone who reported testing the performance hit.

Paul Vick's July 15, 2003 Dictionary Lookup Operator post explains how the bang operator works and, in a comment, it's equivalent in C#—x!y : x["y"].

C# Joins and Associations in Typed and Untyped DataSets

LINQ to DataSets is most likely to be used in place of DataViews where joins between related tables are required. (DataViews don't support joins or projections (SELECT lists), so you need LINQ to DataSets or a custom DataView class, such as the JoinView from MSDN, to accommodate them.) I needed to verify the differences in the syntax for LINQ to DataSet queries over typed or untyped multi-table DataSets that use joins or navigate associations and include value types (specifically DateTime) with nulls. Erick Thompson wrote a Nulls - LINQ to DataSets Part 3 treatise in his LINQ to DataSet Documentation Series but didn't mention nullable datatypes in conjunction with the added DataRow.Field<T> collection.

Here's the C# test query for the typed DataSet with joins, which returns nine rows with values from each of four related Northwind tables, including one null ShippedDate value to test null handling:

var query1 = from c in dsNwindT.Customers
             join o in dsNwindT.Orders on c.CustomerID equals o.CustomerID
             join d in dsNwindT.Order_Details on o.OrderID equals d.OrderID
             join p in dsNwindT.Products on d.ProductID equals p.ProductID
             where p.ProductID == 2 && c.Country == "USA"
             orderby o.OrderID descending
             select new { c.CustomerID, c.CompanyName, o.OrderID,
                          ShippedDate = o.Field<DateTime?>("ShippedDate"),
                          p.ProductName, d.Quantity };

With the exception of the code to supply the ShippedDate value (emphasized), the preceding query statement is identical to the corresponding LINQ to SQL query (just replace the dsNwindT DataSet with a DataContext.

You lose strong typing of related DataRows when you navigate associations (ChildRows or ParentRows) between related DataTables instead of joining the DataTables. This requires FieldName = DataTable.Field<DataType>("FieldName") expressions for non-nullable and FieldName = DataTable.Field<NullableType>("FieldName") expressions for nullable fields for all but the topmost table of the hierarchy:

var query2 = from c in dsNwindT.Customers
             from o in c.GetChildRows("FK_Orders_Customers")
             from d in o.GetChildRows("FK_Order_Details_Orders")
             from p in d.GetParentRows("FK_Order_Details_Products")
             where p.Field<int>("ProductID") == 2 && c.Country == "USA"
             orderby o.Field<int>("OrderID") descending
             select new {
                 c.CustomerID, c.CompanyName,
                 OrderID = o.Field<int>("OrderID"),
                 ShippedDate = o.Field<DateTime?>("ShippedDate"),
                 ProductName = p.Field<string>("ProductName"),
                 Quantity = d.Field<short>("Quantity") };

Join syntax for untyped DataSets becomes quite cumbersome because of the need to apply the AsEnumerable() method to each DataTable and DataTable.Field<DataType>("FieldName") to the joined fields:

var query3 = from c in dsNwindU.Tables["Customers"].AsEnumerable()
             join o in dsNwindU.Tables["Orders"].AsEnumerable() on
                 c.Field<string>("CustomerID") equals
                 o.Field<string>("CustomerID")
             join d in dsNwindU.Tables["Order_Details"].AsEnumerable() on
                 o.Field<int>("OrderID") equals d.Field<int>("OrderID")
             join p in dsNwindU.Tables["Products"].AsEnumerable() on
                 d.Field<int>("ProductID") equals p.Field<int>("ProductID")
             where p.Field<int>("ProductID") == 2 &&
                 c.Field<string>("Country") == "USA"
             orderby o.Field<int>("OrderID") descending
             select new {
                 OrderID = o.Field<int>("OrderID"),
                 ShippedDate = o.Field<DateTime?>("ShippedDate"),
                 ProductName = p.Field<string>("ProductName"),
                 Quantity = d.Field<short>("Quantity") };

The added typing for untyped DataSets with associations instead of joins (emphasized) isn't that great:

var query4 = from c in dsNwindU.Tables["Customers"].AsEnumerable()
             from o in c.GetChildRows("FK_Orders_Customers")
             from d in o.GetChildRows("FK_Order_Details_Orders")
             from p in d.GetParentRows("FK_Order_Details_Products")
             where p.Field<int>("ProductID") == 2 &&
                c.Field<string>("Country") == "USA"
             orderby o.Field<int>("OrderID") descending
             select new {
                 CustomerID = c.Field<string>("CustomerID"),
                 CompanyName = c.Field<string>("CompanyName"),
                 OrderID = o.Field<int>("OrderID"),
                 ShippedDate = o.Field<DateTime?>("ShippedDate"),
                 ProductName = p.Field<string>("ProductName"),
                 Quantity = d.Field<short>("Quantity") };

But the added typing probably would trouble C# purists who substitute lambda functions for enhanced query syntax because doing so saves a few characters.

VB Associations in Typed and Untyped DataSets

When skimming the sample queries in the "LINQ over DataSet for VB Developers" whitepaper for the May 2006 LINQ preview, I noticed extensive use of the bang operator in DataTable!FieldName expressions taking the place of FieldName = DataTable.Field<DataType>("FieldName") and FieldName = DataTable.Field<NullableType>("FieldName") expressions. Strangely, there's no mention of "bang" or "dictionary" in the nine-page document.

Unfortunately, Orcas Beta 1's VB compiler doesn't implement the Join keyword and joins using SQL-89 equi-join syntax are limited to two tables. Thus, my tests with VB are limited to navigating associations.

Here's the VB version of C# query2, which returns objects rather than designated data types for fields other than CustomerID and CompanyName:

Dim Query2 = From c In dsNwindT.Customers, _
             o In c.GetChildRows("FK_Orders_Customers"), _
             d In o.GetChildRows("FK_Order_Details_Orders"), _
             p In d.GetParentRows("FK_Order_Details_Products") _
             Where CInt(p!ProductID) = 2 AndAlso c!Country.ToString = "USA" _
             Order By o!OrderID Descending _
             Select c.CustomerID, c.CompanyName, o!OrderID, _
                 o!ShippedDate, p!ProductName, d!Quantity

Notice that no special treatment of the ShippedDate field is required to handle null values. However, if you attempt to strongly type ShippedDate with a CType(o!ShippedDate, Nullable(Of DateTime)) expression, a runtime error occurs when iterating the row with the null value.

And here's the VB version of C# query4, which returns objects for all fields:

Dim Query4 = From c In dsNwindU.Tables("Customers"), _
             o In c.GetChildRows("FK_Orders_Customers"), _
             d In o.GetChildRows("FK_Order_Details_Orders"), _
             p In d.GetParentRows("FK_Order_Details_Products") _
             Where CInt(p!ProductID) = 2 AndAlso c!Country.ToString = "USA" _
             Order By o!OrderID Descending _
             Select c!CustomerID, c!CompanyName, o!OrderID, _
                 o!ShippedDate, p!ProductName, d!Quantity

Notice that the AsEnumerable() method is missing from the DataTable identifier in the first line. The C# query won't compile without it but the VB compiler doesn't care if it's present or not.

I believe it's a good bet that VB's Join syntax for untyped DataSets in Beta 2 will look like this, which is considerably more concise than that of the C# version:

Dim Query4 = From c In dsNwindT.Customers
             Join o In dsNwindT.Orders On c!CustomerID Equals o!CustomerID
             Join d In dsNwindT.Order_Details On o!OrderID Equals d!OrderID
             Join p in dsNwindT.Products On d!ProductID Equals p!ProductID
             Where CInt(p!ProductID) = 2 AndAlso c!Country.ToString = "USA"
             Order By o!OrderID descending
             Select c!CustomerID, c!CompanyName, o!OrderID, o!ShippedDate,
                 p!ProductName, d!Quantity

Update 5/18/2007: According to VB Team member Amanda Silver, VB's Join syntax uses the Equals operator (equals in C#) instead of the = operator to make explicit that the expression uses Object.Equals as the comparison operator.

Conclusion: C# purists castigate VB for its verbosity and some condemn VB code as prolix. VB's syntax for LINQ queries against joined or associated tables is equally or more concise than that of C#.

Update 5/24/2007: Paul Vick points out in his Javascript in VB ... post that "the Silverlight-based Javascript compiler that we released in Silverlight 1.1" as well as the VBx compiler for VB 10.0 is written in VB. Miguel de Caza mentions in his Compiler Lab: Second Day post:

Their new DLR-based Javascript compiler (that implements the current ECMA spec) was written by two developers in four months. Nandan Prabhu was at the conference and he explained that if they were to rewrite it today it would probably take three months for a programmer to implement as such a person would not have to keep up with the daily changes in the DLR [Dynamic Language Runtime].

It will be interesting to see how the final implementation of VB's Join keyword handles null value types in strongly typed projections.

Update 5/22/2007: You'll need to use the FieldName = DataTable.Field<DataType>("FieldName", DataRowVersion.Version) overload of the Field method in both VB and C# if you want to filter by a member of the DataRowVersion enumeration: Current, Default, Original or Proposed. (The "LINQ to DataSet for C# Developers" whitepaper describes the DataRowVersion overloads, but doesn't include a sample of ther use. "LINQ to DataSet for VB Developers" doesn't even mention these overloads.)

0 comments: