Wednesday, October 04, 2006

New Entity Data Model Graphic Designer Prototype

Microsoft's ADO.NET vNext team posted on MSDN the code for the first Entity Data Model (EDM) Designer Prototype CTP on September 27, 2006. Pablo Castro demoed this prototype in his September 1, 2006 "ADO.NET Entity Framework: What. How. Why." Channel 9 video. The video coincided with the release of the ADO.NET vNext August 2006 CTP, which is an EDM Designer prerequisite.

The accompanying Readme.htm and Overview.doc files make it clear that this EDM Designer release is indeed a prototype. The Designer has the capability to generate the initial storage schema (SSDL file), mapping layer (MSL file), conceptual layer schema (CSDL file), and C# or VB class code from an SQL Server database. The designer only manipulates the CSDL file. If you alter the data model at the CSDL level or change property values of Entity Types, Entity Sets, Associations, or Association Sets, you must manually edit the MSL file and, in some cases, the SSDL file.

Most EDM code examples use one or two entities from the AdventureWorks sample database and simple associations. Here's Visual Studio 2005 displaying a diagram of the complete Northwind sample database (Access version) with the Entity Container and Entity Type elements rearranged but before making changes to Entity Set/Type names:

Click images to display in full size (800 x 600). Click here for a Web page with more images.

Here's the complete diagram exported by the Designer to a JPEG file:

Notice that Associations replace Entity Type foreign key properties, such as Orders.CustomerID, Orders.EmployeeID, and Orders.ShipperID.

The Designer inherits limitations of the ADO.NET vNext August 2006 CTP: No support for SQL Server's text, ntext, or xml data types (Employees.Notes, Products.HomePage, and Categories.Description fields), or Associations for tables with composite primary keys (Line Items). Therefor, FK_Order_Details_Orders and FK_Order_Details_Products Association lines are missing in the preceding diagram and Order_Details entities aren't updatable.

Changing Entity Type, Entity Set, and Association Names

The prototype Designer doesn't offer an option to singularize plural table names as Entity Type names. Changing Entity Type, Entity Set, Association, and Association Set names in the designer is easy, but making the requisite edits to the MSL and SSDL files isn't a walk in the park. Search and replace works for most name changes, but the CSDL file often loses the 63 suffix that distinguishes the two Employees/Employee endpoints after editing other names. You'll also encounter issues with entity or property names that conflict with C# or VB reserved words, such as Order in VB.

Here's the JPEG diagram of the preceding diagram with Entity Type names singularized, the Orders Entity Set renamed to SalesOrders, and Order Details changed to LineItems.

Notice that cardinality of the Employee.EmployeeID:Employee.ReportsTo Association in the preceding screen capture was mistakenly changed from 0..1:0..* (one-to-many) to 0..1:0..1 (one-to-one with nulls allowed) manually. The cardinality was later returned to that generated from the database, but the values in the diagram didn't change to conform the updated setting.

Testing Associations, Deferred Loading, and MARS

After changing Association names in the schema and mapping files, it's a good idea to verify that the renamed Associations continue to correspond to table relationships. This requires a more sophisticated test harness than the C# console demo projects favored by the ADO.NET vNext team. You need a grid to view properties of multiple entities and their associated foreign key values or counts of related entities.

Fortunately, you can populate a DataGridView control directly by setting its DataSource property value to a ageneric List generated by applying the ToList() method to the Entity Set returned from a LINQ to Entities query:

Dim Orders = From so In edmNwind.SalesOrders _
  Select so _
  Order By so.OrderID Descending
Dim lstOrders As List(Of SalesOrder) = _
With dgvList
  .DataSource = lstOrders
  .Columns.Add("CustomerID", "CustomerID")
  .Columns.Add("EmployeeID", "EmployeeID")
  .Columns.Add("ShipVia", "ShipVia")
End With

Additional code populates the added columns with the foreign key values. Unfortunately, the position sequence of all but the added columns have no relationship to their ordinal value in the table or their sequence in the Entity Type definition. The ADO.NET vNext team intends to improve databinding with, perhaps, a ToBindingList() method or wrapper.

Associations have the GetEntityTypeName() method to return a single related entity and the GetEntityTypeNameRef() method to return a reference to an entity. Like LINQ to SQL (DLinq), the Designer "lazyloads" related entities; that is, the entities aren't retrieved unless or until your code requests them. DLinq has the Including() operator to return related instances such as SalesOrder.Customer or Product.Category. The EDM requires you to load related entities explicitly with a procedure such as:

Public Sub GetCustomerIDForSalesOrder(ByVal so As SalesOrder, _
    ByVal intRow As Integer)
  If Not FK_SalesOrders_Customers.GetCustomerRef(so).IsLoaded Then
  End If
  Dim Cust As Customer
  Cust = FK_SalesOrders_Customers.GetCustomer(so)
  If Cust IsNot Nothing Then
    dgvList.Rows(intRow).Cells("CustomerID").Value = Cust.CustomerID
  End If
  Cust = Nothing
End Sub
If you execute the preceding code within a For...Next loop to retrieve the SalesOrder (so) entities, you must enable SQL Server 2005's Multiple Active Resultsets (MARS) feature to open a second DataReader on the same database connection:
'Retreive foreign key values on multiple connections
For Each so In Orders
  GetCustomerIDForSalesOrder(so, intRow)
  GetEmployeeIDForSalesOrder(so, intRow)
  GetShipperIDForSalesOrder(so, intRow)
  intRow += 1
The alternative, which provides a very slight performance gain, is to iterate the List of SalesOrder instances three times to retrieve the foreign key values:
'Retreive foreign key values sequentially
For Each so In lstOrders
  GetCustomerIDForSalesOrder(so, intRow)
  intRow += 1
intRow = 0
For Each so In lstOrders
  GetEmployeeIDForSalesOrder(so, intRow)
  intRow += 1
intRow = 0
For Each so In lstOrders
  GetShipperIDForSalesOrder(so, intRow)
  intRow += 1

Here's the WinForm test harness's DataGridView displaying a few foreign key values in the last three columns:

Notice the relatively long time (80 seconds) required for the almost 2,500 T-SQL queries required to retrieve the three foreign key values for 830 SalesOrder entities. Once the related entities are loaded in memory, execution time decreases dramatically, as shown here:

Note: The design of the Employees table implements the adjacency technique for representing a hierarchy in a table of a relational database. Path and nested sets are alternative—but less performant—approaches. Anthony Bloesch's February 15, 2006 "Hierarchies (trees) in SQL Server 2005" post compares the three techniques and their performance in a 20,000-node implementation.

Cache Uniqueing Issues and Lost Values

EDM employes uniquing, the process by which Object/Relational Mapping (O/RM) tools maintain a single copy of each unique entity instance, such as a Customer, Order, or Product, regardless of how many times a query retrieves an instance. The test harness's 91 Customers placed 830 Orders, thus the (830 - 91) or 739 Customer instances retrieved from the database to obtain foreign-key value are duplicates. Similarly, one of nine employees receive credit for orders so 821 retrieved Employee instances are duplicates. Finally, there are three shipping companies, so 827 Shipper instances are duplicates. Adding counters for the Load method of Customer, Employee, Shipper, Categorie, and Supplier entity instances confirms the retrieval of duplicates. (These counters aren't don't appear in the test harness screen captures.)

Uniquing is implemented by an Identity Map. Martin Fowler states in Patterns of Application Architecture that an Identity Map: "Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them." Most O/RMs implement Identity Maps in cache, often as a hashtable of values calculated from primary keys. The fact that the missing ReportsTo values are duplicates of the two values returned from the cache indicates a problem with the uniqueing algorithm.

Note: A counter on Employee entity instance retrievals indicates that Employee instances aren't cached. The first time cached Order instances are traversed, the Employee retrieval instance counter increments by 839; additional traversals increment by 830.

Frans Bouma, developer of the LLBGen O/RM for .NET and Microsoft MVP, contends in his "Why a cache in an O/R mapper doesn't make it fetch data faster" blog post that caching doesn't necessarily make data retrieval more efficient. The time to look up the newly retreived object's ID in an Indentity Map that contains a large number of hashes might contribute to the apparent poor performance when initially filling the Orders grid.

Composite Keys, Surrogate Keys, Natural Keys, and Ruby on Rails

Bouma also observes that "As ... compound [composite] PKs can complicate the calculation process, it's not that straight forward to get the lookup process of entities very efficient."

Ruby on Rails' lack of of composite PK support by its O/RM tool caused Martin Fowler to raise the spectre of substituting an autoincrementing (identity) integer column as a surrogate primary key. Jonathan Ellis' July 12, 2006 "Single column primary keys should be enough for anybody" post and Josh Berkus' three-part series on surrogate vs. "Primary Keyvil, Part I," "Part II (Reasons to Use an Autonumber Surrogate Key)," and "Part III (Reasons to Use an Autonumber Surrogate Key, Evaluated)" make the case for natural keys.

Note: Kevin Hoffman, TheDotNetAddict of .Net Developer's Journal, posted on October 6, 2006, "ASP.NET vs Ruby on Rails : Round 3 (Object-Relational Modeling)," which compares ASP.NET vNext's Entity Framework with Rails' ActiveRecord as an O/R Modeling approach. This page links to other Kevin Hoffman posts, many of which relate to LINQ, ASP.NET vNext, and ADOEF. Microsoft's Blinq tool, which has capabilities similar to Rails' scaffolding feature, handles composite keys with no problem. Blinq 1.0 uses LINQ to SQL (DLinq), not ADOEF.

Natural keys, which are values in the table, prevent entering duplicate data. The Northwind Order Details table has a composite natural primary key that consists of the OrderID and ProductID columns. In addition to uniquely identifying a row, the primary key prevents data entry operators from entering rows that duplicate a product line item in an order. Hopefully, support for composite primary keys will arrive in ADO.NET vNext shortly.

For general background on caching, read "Caching Architecture Guide for .NET Framework Applications," by Avi Ben-Menahem and Ronen Ashkenazi (patterns & practices).

Franz Brouma's "Essay: The Database Model is the Domain Model" offers an approachable view of the the major issues that O/RM seeks to address.

A related "Objectify Data With ADO.NET vNext" article in the October 2006 issue of Visual Studio Magazine provides more information about EF, EDM, and the EDM Designer.

Updated: 10/8/2006, 10/11/2006, 10/30/2006

Technorati Tags: ADO.NET vNext, Entity Framework, Entity Data Model, Entity Data Model Designer, EDM Designer, LINQ, LINQ to Entities, LINQ to SQL, Entity SQL, eSQL, DLinq, VB 9.0, Orcas