Saturday, July 28, 2007

Strange Change to VB Join and Group By Expressions in VS2008 Beta 2

Last week, I spent some time testing an assortment of C# and Visual Basic versions of LINQ query expressions having Join ... Into and Group By ... Into clauses. Until yesterday, I was using the Orcas June 2007 CTP, which finally re-enabled the GroupJoin and GroupBy standard query operators in VB projects.

Yesterday, I installed an instance of VS 2008 Beta 2 on a Virtual Server 2005 R2 SP2 VM running Windows Vista Premium as the host OS and moved my test code to the new VM. My collected C# queries compiled and ran as expected, but the VB compilation wouldn't compile.

Change to Group Join Syntax

The problem for a Group Join query expression turned out to be the expressionList identifier co following the Into keyword, as shown here:

The error message was "Definition of method 'co' is not accessible in this context" and IntelliSense added method-style parenthesis to the identifier. There is no indication in the February 2007 Overview of Visual Basic 9.0 white paper that the VB and C# [Group ]Join/join and Group By/groupby syntax differed by more than keyword capitalization and and space stripping.

Note: The current (and almost final, 500-page) C# Language Specification Version 3.0 covers translation of C# query expression syntax to anonymous (lambda) method calls in section 7.15, starting on page 209.

Based on my previous experience with VB GroupJoin...Into and C# Join...Into expressions, I had added an identifier for an expected range variable, as in this C# equivalent that compiles and produces the expected result in both Orcas June 2007 CTP and VS 2008 Beta 2:


I went back to the VB version, put the caret on Group Join and pressed F1. Thankfully, context-sensitive help was working and I uncovered this interesting paragraph:

expressionList
Required. One or more expressions that identify how the groups of elements from the collection are aggregated. To identify a member name for the grouped results, use the Group keyword (<alias> = Group). You can also include aggregate functions to apply to the group.

I also took a look at the newly added Join examples in the VB Project LINQ Sample Query Explorer. The Group Join example used the Group keyword as the Into identifier. Here's the VS 2008 VB editor's enhanced IntelliSense list and tooltip for the Into keyword's <group alias> = assignment:

Note: Scott Guthrie's Nice VS 2008 Code Editing Improvements post covers most new VB and C# editor features.

Changes from co to Group in two places solved the problem, as shown here in the final VB query expression:


Note that Group is a keyword in this context.

Here's the output in a simple test harness:

Update 7/30/2007: VB MVP Bill McCarthy reports in a comment to to this post that VB also supports Group Join identifier1 ... Into identifier2 = Group and Group By's Group prefix1 By prefix1.identifier1 Into identifier2 = Group expressions.

Online help doesn't include this extended syntax in its Group Join or  Group By topics.

Update 7/31/2007:  The runtime exception reported with the Group By implementation was my error. I successfully reused the Query11 variable for two forms of the Group Join query but using the Query16 variable for two forms of the Group By query caused a type conflict (see below).

Change to Group By Syntax

A similar problem occurred with a query expression containing Group By ... Into clause, in this case with the c identifier. Here's the original VB version with the iterator code, because the iterator gets a significant change:


And here's the C# version which compiles and runs as expected in both VS versions:


The Group By select new {...} clause isn't optional for C# query expressions.

Note that the final version uses a StringBuilder object to populate the text box shown here:

Click for Larger Image

Finally, here's the modified VB version that compiles and runs in VS 2008 Beta 2:


Update 7/30/2007: Here are the modifications that compiled but wouldn't run for me:

Query16 = From c In CustomerList _
          Order By c.Country, c.CustomerID _
          Group c By c.Country Into g = Group _
          Select New With {Country, g} 'Not optional

The runtime error message I received was:

System.InvalidCastException was unhandled
  Message="Unable to cast object of type '<SelectIterator>d__d`2[VB$AnonymousType_12`2[System.String,System.Collections.Generic.IEnumerable`1[QueryOperatorsVB.MainForm+Customer]],VB$AnonymousType_13`2[System.String,System.Collections.Generic.IEnumerable`1[QueryOperatorsVB.MainForm+Customer]]]' to type 'System.Collections.Generic.IEnumerable`1[VB$AnonymousType_11`2[System.String,System.Collections.Generic.IEnumerable`1

Update 7/31/2007: As noted in the preceding section, the cause of the problem was using the Query16 variable for two forms of the Group By query, which worked for Group Join but not Group By. Notice that Dim is missing in the preceding code. Adding Dim and giving the query a new name solved the problem, and made the Query16 statement optional.

The type conflict was between the immutable anonymous type returned by the original version's Select Country, Group Join clause or the implied Select clause and the mutable anonymous type returned by the Select New With {Country, g} clause. For more information on VB's new mutable anonymous types in VS 2008 Beta 2 see Paul Vick's Mutable and immutable anonymous types, and keys post of May 11, 2007.

Bill gave another sample of an even more flexible Group By syntax:

Query16 = From c In CustomerList _
          Order By c.Country, c.CustomerID _
          Group c By c = c.Country Into g = Group _
          Select c, g} 'Optional

which resembles a C# lambda function (but isn't; the > is missing). In this case, you can assign an identifier to both the key and the group items.

Note: As you'd expect, LINQ queries over small collections of in-memory objects are fast. Seventeen sample queries execute against objects created from the Northwind Customers table (91 rows) in 124 milliseconds (when you append all text to a StringBuilder object and use it to populate the text box after executing the last query.

Conclusion

LINQ's objective is to provide a common set of Standard Query Operators for multiple data domains. That said, however, doesn't mean that developers can expect a common query expression syntax for multiple .NET languages.

The first major departure between C# and VB LINQ features was VB's added explicit XML authoring capability. VB supports Group as a keyword but C# doesn't. It will be interesting to see how far the new dynamic languages—IronPython and IronRuby—will depart from the C# query expression translation specification.

Note: Iron Python 2.0 will support LINQ to Objects (based on its presence in the System.Core namespace) and LINQ to XML is promised later for the DLR.

Important other VB discovery: ByVal is no longer required as a prefix to parameter names of VB lambda (inline) functions, which first arrived in the Orcas June 2007 CTP.  ByVal's presence won't throw an exception but if you edit an expression containing it, ByVal disappears and the editor won't let you add it back.

6 comments:

Bill McCarthy said...

Hi Roger,

In VB you can still use aliases for the group, eg:

From c in CustomerList _
Where c.Country ="USA" _
Group Join o In OrderList _
On c.CustomerId Equals o.CustomerId Into co = Group
....

The significnat difference between VB and C# on this is mor to do with the return type. C# returns a Grouping(Of TKey, TData) whereas VB just returns an IEnumerable(Of T) which is actually a nested IEnumerable.
I've bene told the advantage of this is in the future VB can use parallelism for this, whereas the C# approach can't.

--rj said...

Bill,

Thanks for the added info. Online help doesn't mention this syntax.

As noted in the updates to this post, 'Into identifier = Group' will compile for Group Join and Group By, but throws a runtime exception in the Group By implemetation.

--rj

Bill McCarthy said...

Hey Roger,

Got your heads up via my blog . Thanks :)

I tested this using the "Sample Queries" project for Beta 2, with the following code:

Dim cList = GetCustomerList()
Dim query = From c In cList _
Order By c.Country, c.CustomerID _
Group c By c.Country Into g = Group _
Select New With {Country, g} 'Not optional

ObjectDumper.Write(query, 1)

This worked perfectly for me.

Bill McCarthy said...

btw: I also tested it this way which I kind of prefer:

Dim cList = GetCustomerList()
Dim query = From c In cList _
Order By c.Country, c.CustomerID _
Group c By cntry = c.Country Into g = Group _
Select cntry, g
ObjectDumper.Write(query, 1)

Bill McCarthy said...

oh, and the Select part is optional too :)

Bill McCarthy said...

This looks like I'm having a conversation wiht myself ;) But in reply to your last reply on my blog, When you use the
New With { ..} syntax, the properties of that anonymous type are all mutable (Get and Set)
If however you use the implied select, or a simple select such as Select c.Name, c.ID etc, the properties are ReadOnly.

So yes, you ahd two different anonymous type signatures. You could probably try using :

New With {Key Country, Key g}

or similar. The Key will make the field ReadOnly (immutable) so hopefully the signatures will match.