Thursday, April 12, 2007

Entity Change Tracking in LINQ to SQL and the Entity Framework, Round 2

Entity change tracking is required to detect and resolve conflicts that can occur when more than one user updates an instance of a particular entity, such as a SalesOrder or MedicalRecord object, concurrently. Thus the common term for this problem, which occurs only with multi-user systems, is concurrency conflict.

There are two methods of handling concurrency conflicts: pessimistic locking and optimistic locking. If you're familiar with optimistic locking and handling concurrency conflicts, jump to the "Resolving Concurrency Conflicts with LINQ to SQL and the Entity Framework" topic.

Pessimistic Locking to Prevent Concurrency Issues

If the application's data store, usually a relational database, uses pessimistic locking, the following actions occur in response to an entity update request:

  1. User 1 requests a copy of the entity from the data store.
  2. The data store locks the record(s) associated with the entity to prevent other users from editing them. (For a SalesOrder, there is typically a record for a SalesOrderHeader and one for each SalesOrderLineItem.)
  3. User 1 edits the entity.
  4. User 1 submits the changes to the data store directly (two-tier) or through a business services layer (n-tier).
  5. The business services layer, the data store, or both determine that the changes don't violate either business policies or relational integrity.
  6. The data store completes the transaction, updates the record(s) and releases its locks on the records.
  7. User 2 repeat steps 1 through 6.

Pessimistic locking prevents concurrency conflicts at the expense of data accessibility. If User 1 goes to lunch or home after completing step 1 but before performing step 4, other users can't edit (and applications might not even be able to read) the locked data until User 1 returns to work and completes step 4 or an application-imposed time limit expires, which cancels the editing operation.

Optimistic Locking and Handling Concurrency Issues

Optimistic locking maximizes data accessibility but introduces uncertainty into data values when multiple users edit the same records simultaneously. The degree of uncertainty increases when multiple users edit independent copies of the data and submit the changes to the data store. This problem arises in any application that makes use of object/relational mapping techniques or ADO.NET DataSets. Data uncertainty is further exacerbated as the copy grows older; that is, the original data becomes stale.

If the application's data store uses optimistic locking, the following actions can occur in response to User 1 and User 2 obtaining a copy of the same entity concurrently:

  1. User 1 requests and obtains a copy of the entity. The data store doesn't lock the associated records.
  2. User 2 requests and obtains a copy of the entity before User 1 submits her edits.
  3. User 1 submits her edits which conform to business rules and don't violate relational integrity.
  4. A short time later, User 2 submits his edits which also conform to business rules and don't violate relational integrity.
  5. If the client, business layer (if present), and data store support concurrency management, a concurrency exception occurs. The exception is needed because User 2 might have made no changes or different changes if he was aware of the changes made by User 1.
  6. User 2 (or a supervisor/manager) resolves the concurrency conflict or User 2 cancels the edit.

The conflict must be resolved in step 6 because User 2 has captured original member (property) value(s) or a timestamp value in step 2. The data store detects concurrency conflict(s) by comparing original values with corresponding member values in the corresponding table cells or a timestamp cell. If any pair of values differ, the data store rolls back the pending transaction and reports that no records were affected by the submission. In this case, the data store, business layer, or client code throws an exception. Repeated attempts by User 2 to update the data with his values will fail until User 2 either abandons the editing attempt or synchronizes his original or timestamp value(s) with those of the data store.

Note: My "Optimize Update-Concurrency Tests" article from the October 2003 issue of Visual Studio Magazine compares the performance of concurrency management by testing member values of typed DataSet tables or SQL Server 2000 timestamp values. The tests results showed timestamp-based test with relatively large objects, such as a SalesOrderHeader, outperform member value tests by a factor of about 2.1.

Resolving Concurrency Conflicts with LINQ to SQL and the Entity Framework

Concurrency conflicts can be resolved by blindly allowing the last update to prevail (called the "last in wins" approach) or by presenting the user with resolution options and (hopefully) sufficient information on which to make an informed judgment.

Orcas March 2007 CTP's LINQ to SQL implementation offers developers the capability to easily create a conflict resolution message box, such as this:

The conflict occurred because the original ContactName value (Maria Anders) didn't match User 1's change to the database value (Maria Anders-Brown). User 2's option descriptions are taken directly from the Orcas March 2007 CTP's online help file. The database value is retrieved with the exception raised by the database (SQL Server 200x only for LINQ to SQL). Current database values are needed to make an informed conflict resolution decision. (Timestamp values aren't meaningful; they only indicate that underlying data has changed while an edit was in progress.)

Here's the VB 9.0 code to process changes tracked by the DataContext object (dcNwind) and resolve concurrency conflicts:
Private Sub ProcessChanges()
  'Process changes with concurrency conflict resolution
  sbLog.Length = 0
  Try
    dcNwind.SubmitChanges(ConflictMode.ContinueOnConflict)
  Catch ccExc As ChangeConflictException
    'Enumerate the change conflicts for updates and deletions
    For Each ccObject As ObjectChangeConflict In dcNwind.ChangeConflicts
      For Each ccMember As MemberChangeConflict In ccObject.MemberConflicts
        Dim strMsg As String = dcNwind.ChangeConflicts.Count.ToString + _
          " concurrency conflict(s) have been detected in Object '" + _
          ccObject.Object.ToString + "', "
        With ccMember
          strMsg += "Member '" + .Member.Name + "'" + vbCrLf + _
            "Original Value = " + .OriginalValue.ToString + vbCrLf + _
            "Current Value = " + .CurrentValue.ToString + vbCrLf + _
            "Database Value = " + .DatabaseValue.ToString + vbCrLf + vbCrLf + _
            "Click Yes to force the Refresh method ... values." + vbCrLf + _
            "Click No to force the Refresh method ... modified." + vbCrLf + _
            "Click Cancel to force the Refresh ... database."
          'Resolve each member conflict
          Dim intResolve As Integer = MsgBox(strMsg, MsgBoxStyle.Exclamation, _
            Or MsgBoxStyle.YesNoCancel, _
            "Concurency Conflict Resolution for " + ccObject.Object.ToString)
          If intResolve = MsgBoxResult.Yes Then
            .Resolve(RefreshMode.KeepChanges)
          ElseIf intResolve = MsgBoxResult.No Then
            .Resolve(RefreshMode.KeepCurrentValues)
          Else
            .Resolve(RefreshMode.OverwriteCurrentValues)
          End If
        End With
      Next
    Next
    'No exception should occur at this point
    dcNwind.SubmitChanges(ConflictMode.FailOnFirstConflict)
  Catch exc As Exception
    MsgBox(exc.Message, MsgBoxStyle.Exclamation, _
      "Non-Concurrency Exception Submitting Changes")
  Finally
    txtLog.Text = sbLog.ToString
  End Try
End Sub

Conflict resolution by user intervention is much more practical in a conventional, 2-tier (client/server) environment than with n-tier or service-oriented architecture (SOA) architecture.

The Entity Framework's ObjectStateManager detects concurrency conflicts but doesn't offer a built-in handler to resolve them at the member level. The ObjectStateManager provides all current and original values (but not database values) for comparison:

Following is the code that produces the preceding message box:

Private Sub btnSaveChanges_Click(ByVal sender As System.Object, _
    ByVal e As System.EventArgs) Handles btnSaveChanges.Click
  'Save changes and report optimistic concurrency errors that occur
  Dim timSave As New Stopwatch
  timSave.Start()
  Try
    Dim intSavedEntities As Integer = edmNwind.SaveChanges
  Catch ccExc As OptimisticConcurrencyException
    Dim osEntries As IEnumerable(Of ObjectStateEntry) = ccExc.StateEntries
    Dim strMsg As String = Nothing
    Dim intCtr As Integer = 1
    For Each osEntry As ObjectStateEntry In osEntries
      strMsg += "Concurrency exception " + intCtr.ToString + " in entity set "
      strMsg += osEntry.EntitySet.Name + "." + vbCrLf + vbCrLf + "Current values: "
      For intCol As Integer = 0 To osEntry.CurrentValues.FieldCount - 1
        strMsg += osEntry.CurrentValues.Item(intCol).ToString + "; "
      Next
      strMsg += vbCrLf
      strMsg += "Original values: "
      For intCol As Integer = 0 To osEntry.OriginalValues.FieldCount - 1
        strMsg += osEntry.OriginalValues.Item(intCol).ToString + "; "
      Next
      strMsg += vbCrLf + vbCrLf
      intCtr += 1
    Next
    strMsg += ccExc.Message
    MsgBox(strMsg, MsgBoxStyle.Exclamation, "Optimistic Currency Exception")
  Catch sysExc As Exception
    MsgBox(sysExc.Message, MsgBoxStyle.Exclamation, _
      "Error Saving Changes to Database")
  Finally
    txtTime.Text = Format(timSave.ElapsedMilliseconds / 1000, "0.000")
  End Try
End Sub

The Entity Framework developer must add procedural code to resolve the conflicts; retrieving database values requires a round-trip to the server with a DataReader to retrieve the current row values.

The Current Need for Do-It-Yourself Change Tracking

The key to detecting concurrency conflicts is to maintain original member values or the timestamp value. The DataContext object handles change tracking for LINQ to SQL; the ObjectStateManager does the same for the Entity Framework's Object Services layer. Neither object tracking approach is well suited to n-tier, SOA, or ASP.NET Web applications that update data because the object copy being edited in the client tier is detached from the change tracker in the middle tier or service layer. Currently, the developer must write change tracking code to track changes to the detached object and pass the change information along with the edited object to the middle tier. I use the term detached (from the change-tracker) so as not to confuse the problem with disconnected (from the database.)

Note: See "Disconnected Operation" and the Entity Framework for an example of confusing disconnected with detached.

Anders Borum started on 4/9/2007 a Entity Change Tracking - questions LINQ to SQL design thread in the LINQ Project General forum that proposes tracking changes to entities in the entities themselves instead of the DataContext:

Why does LINQ to SQL not implement a change tracking mechanism on an entity level?

Without the entity tracking mechanism, it would seem like the framework is primarily targeted at developers working in a truly connected state (i.e. always use a DataContext to query for entities and remain connected (to a DataContext) throughout the lifetime of the entities).

I consider it a really big obstacle to fully accept LINQ to SQL as a mature ORM if I have to implement the change tracking mechanism every time I design an architecture with more than 2 tiers (e.g. a standard ASP.NET application that creates a DataContext for each request, makes changes and disposes resources).

Note: Anders started the Compiled Queries in LINQ to SQL - performance considerations and entity caching thread that I mentioned in my earlier LINQ and Entity Framework Resource Updates post. This thread is definitely worth reading in its entirety.

Anders' issue is the same as that Frans Bouma raises in his Why change-tracking has to be part of an entity object post:

I used ASP.NET as an example to show the stupidity of this EDM/Linq to Sql design flaw, but any setup where an entity object gets disconnected from the session/context which fetched it, will result in this [requirement for] 'DIY changetracking' code. And for what? Why isn't this solved properly? Beats me... you'd think that after all these years, Microsoft would come up with a framework for data access / entity management which would really help the developer with taking care of the plumbing crap and let the developer focus on what really matters: the application code.

Although Frans' post is directed primary to the Entity Framework, his complaint about loss of original values in n-tier applications and the requirement for developers to roll their own value tracking code echo's Anders' complaint.

Do-it-yourself change tracking becomes more practical when you compare timestamp values, as Frans observes in this comment. Timestamp and primary key values don't change (unless your table has a composite primary key), so you can use these values as original values in most cases. If you choose the timestamp approach, you must make a roundtrip to the database to compare current values with database values to determine which member values have pending changes. However, database values don't necessarily correspond to original values at the time editing began.

The LINQ to SQL Team's Proposed Change Tracking Solution

I found it strange that Anders' 4/9/2007 post hadn't elicited a response from members of the LINQ to SQL team by 4/12/2007. It turned out that the thread had moved to an earlier and somewhat related update existing disconnected objects (dlinq) thread that started on 2/1/2007. Microsoft's Matt Warren delivered this initial response on 4/11/2007:

The reconnecting API's you have available to you at this time do not support the scenario you guys are discussing. There is more work coming that makes 3-tier scenario's easier. The re-attach and playback API's are included because they are the bare minimum required to make any reconnecting work at all, however cumbersome it may seem at this time. There is an upcoming feature that automatically databinds LINQ to SQL objects directly to webforms that use these API's to communicate changes back into the DataContext. There is also a technology in the works that automatically serializes LINQ to SQL objects to other tiers, change tracks and data binds the objects for you on that tier and serializes the objects back, solving the cyclic reference serialization problem and the indescribable schema problem for objects with change sets; which seems to match what many of you are describing. [Emphasis added.]

Matt's subsequent message clarifies the technology in the works:

The mechanism that does the change tracking on the client is similar to a mini-connectionless DataContext. It is a type that packages up all the objects, lists of objects and graphs that you want to send to the client. It serializes itself and everything you've given it automatically. (It implements IXmlSerializable.) On the client, the same class also manages change-tracking for all the objects that were serialized with it. When serialized again (on the trip back) it serializes both the objects and their change information that it logged. Back on the middle tier, you just make one call to re-attach the whole package to a new DataContext instance and then call SubmitChanges.

The remaining question is how the Entity Framework folks plan to deal with this issue.

Another concurrency-related issue is the necessity to store and test the values of all related objects in the graph when undetected modifications to dependent objects might have serious medical or economic consequences. I'll reserve that issue for a future round of the Entity Change Tracking in LINQ to SQL and the Entity Framework quartet.

Updated 4/12/2007: Fixed errors due to uploading a draft instead of the final version of the initial post and added responses from Matt Warren in the LINQ Project General forum.

1 comments:

Anonymous said...

Indeed a very interesting discussion. Knowing the March 2007 CTP is an early tech preview, I wasn't surprised that the LINQ to SQL was incomplete.

However, what surprised me was the lack of information on change tracking infrastructure published by Microsoft (which is why I started the thread).

Any small information would have helped, but nothing was published. Hopefully blog entries like yours (and community forum entries) provide other developers with answers to these important questions.

I bet a lot of developers are asking the very same questions these days.

With regards
Anders Borum / SphereWorks