"Disconnected Operation" and the Entity Framework
Fabrice Marguerie alerted me this morning to a controversy about the handling of update concurrency conflicts in data-intensive, n-tier applications. Fabrice started his Change tracking, the ADO.NET Entity Framework and DataSets post with:
Andres Aguiar started an interesting discussion about disconnected operation and change tracking in the ADO.NET Entity Framework. [Emphasis added.]
Andres' ADO.NET Orcas Entity Framework and disconnected operation post says:
David [sic] Simmons explained how the 'disconnected mode' works today in Entity Framework (and as far as I know, that's the way it will work in the Orcas release).
Basically, there is no disconnected mode. You can create a context and attach objects that you kept somewhere, telling the context if it was added/deleted/modified.
This basically means that if you plan to build a multi-tier application with ADO.NET Orcas in the middle tier, you will need to hack your own change tracking mechanism in the client, send the whole changeset, and apply it in the middle tier. From this point of view, it's a huge step backwards, as that's something we already have with DataSets today. [Emphasis added.]
I had read Danny Simmons' Change Tracking Mechanisms post and added a link to it Thursday. I didn't recall any mention of disconnected operation or disconnected mode in his post.
I believe that Danny only addressed the Entity Framework's optimistic concurency implementation for data updates in an n-tier environment, not disconnected operation.
Note: Today is Danny's 10th Anniversary as a Microsoft employee.
Defining Disconnected Operations and Occasionally Connected Systems
I view disconnected operations as environment in which:
The client application has sporadic or unreliable access to a network to process typical CRUD data operations. Microsoft's preferred term is Occasionally Connected Systems (OCS). Typical users of OCS are sales people, construction managers, social-services caseworkers, foresters, fish and game officials, and physicians who work in the field, especially in non-urban locations.
Note: The Social Security Administration is testing a new healthcare approach that offers seriously ill patients the services of a primary care physician who make house calls.
Architectural characteristics of the OCS scenario, in my experience, are:
- A smart-client (WinForms) UI
- Task-based data-entry and update forms
- Grids for data entry used only where a single view of multiple items is required
- Locally cached lookup (catalog or historical) data, which can be very large and thus must merge changes rather than require full-table refreshes
- Automatic or semi-automatic push of cached data to the server (two-tier) or data service (n-tier) when connected to the network
- A process to enable the client to resolve data update and deletion concurrency conflicts; alternatively, to notify the client of the action taken by a business rule
- A process to enable the client to resolve insert and other conflicts with multiple rows of child tables, such as order or medication line items *
- Increased probability of concurrency conflicts than with usually connected systems because of increased update latency
- Increased probability of foreign key conflicts due to lookup data latency (stale data)
- A process to test cached inserts and updates for lookup data changes.
* Note: This process isn't addressed by any built-in concurrency handling implementation that I've found, and it also influences the method of handling concurrency conflicts for updates and deletions to child tables. Disconnected or not, you must test for post-retrieval changes by other users to all dependent objects in the graph before committing updates. In many cases, business rules can't resolve these conflicts. I described the approach for DataSets in Expert One-on-One Visual Basic 2005 Database Programming, and I plan to add a blog post and write an article about the technique as it applies to other local persistence implementations.
The preceding definition assumes that, when connected, the client can rely on the data server or access services to invoke methods reliably and quickly (i.e., synchronously). Usually connected systems (UCS) that must deal with unreliable (i.e., asynchronous) data-access services are even more complex because user intervention to resolve complicated concurrency conflicts no longer is real-time. Local data caching capability ordinarily is required even for UCS to support changes to multiple objects.
Local Data Caching for OCS with DataSets
Andres observes that "[W]e already have [change tracking management] with DataSets today." Not only do we already have an optimistic concurrency implementation with DataSets, but a local data cache to handle disconnected operations also. DataSets handle change managment by preserving row state as Added,Deleted,Detached Modified or Unchanged. Invoking the DataSet.GetChanges(Data.DataRowState) method returns a copy of a DataSet that contains rows having the specified DataRowState. The same approach applies to individual DataTables. Orcas's DataSet implementation now includes the ability to generate the DataSet code to another project in preparation for migration to n-tier SOA with the WCF Service template.
Note: See the "N-Tier Support in Typed Dataset" topic in The Visual Basic Team's New Data Tools Features in Visual Studio Orcas post and Steve Lasker's demo of the WCF Service template with split client and server synchronization components in this 12:52 Going N Tier with WCF, Synchronizing data using Sync Services for ADO.NET and SQL Server Compact Edition screencast.
DataSets handle the OCS scenario by persisting an Updategram to the local file system as an XML document (DataSet.WriteXml). When the client boots, it loads the Updategram into the DataSet (DataSet.ReadXml), and tests for network connectivity. If the network is alive, the client attempts to refresh lookup data, process all saved CRUD operations to the data server or service. If not, the client continues with additions to the DataSet, which the user saves manually and the app saves periodically or when closing the main form. I described the process with ADO.NET 1.0 DataSets and SQL Server 2000 in this early "Optimize Update-Concurrency Tests" article for Visual Studio magazine.
This is a scenario that's vastly different from editing an order or medical record in a brower-based form, clicking the Update button, and dealing with the occasional concurrency conflicts caused by other users' edits in the few seconds or minutes between data retrieval and sending updates.
Andres' second post, RE: Disconnected Problems and Solutions, responds to Udi Dahan's Entity Framework: Disconnected Problems & Solutions post. Udi says "I don’t use DataSets that much today anyway." Click here and here for some of Udi's opinions on DataSets.
Most .NET developers aren't partial to DataSets, typed or untyped -- LINQ for DataSet notwithstanding. For example, I've found that the new DataSet.UpdateAll(DataSet) shortcut method has quite poor performance compared to conventional DataSet.GetChanges(Data.DataRowState) code in the Orcas March 2007 CTP.
Update 4/1/2007: Matt Warren sets me straight in his comment on usage of the term disconnected in the LINQ to Entities context. Matt says:
Disconnected objects in LINQ to Entities are not meant to solve the disconnected application problem either. They are merely referred to as 'disconnected' as a means of distinguishing them from actively tracked objects.
Dinesh Kulkarni applies a different definition of disconnected in the LINQ to SQL context in his September 2005 Connected, Disconnected and DLinq post:
Since DLinq is a part of the next version of ADO.NET, it is natural to ask - is it connected or disconnected? After all, we have talked about connected vs disconnected components in ADO.NET quite a bit. DataReader is connected (you are using the underlying connection while consuming the data) while DataSet is disconnected. You need to use DataAdapter to bridge the two worlds. All nice and explicit.
Quite often you need to combine the two modes (as developers do with the DataAdapter + DataSet combo). Wouldn't it be nice if the data access library knew how to provide the benefits of disconnected mode while connecting as and when needed? You would not have the old ADO problems of scalability that ADO.NET solved and yet you would not have to wire all the components explicitly and do all the plumbing yourself. Well DLinq does exactly that.
In this case, disconnected means the database connection is closed while change are made to the cached dataset but reopened in real time when updating the data store.
Update 4/2/2007: I recalled discussions in mid-2006, when the first Entity Framework white papers re-appeared, about . For example, "The ADO.NET Entity Framework: Making the Conceptual Level Real," a revised verision of a presentation to the 25th International Conference on Conceptual Modeling, Tucson, AZ, USA, November 6-9, 2006, by José A. Blakeley, David Campbell, S. Muralidhar, Anil Nori of Microsoft, contains this paragraph:
Occasionally Connected Components. The Entity Framework enhances the well established disconnected programming model of the ADO.NET DataSet. In addition to enhancing the programming experiences around the typed and un-typed DataSets, the Entity Framework embraces the EDM to provide rich disconnected experiences around cached collections of entities and entitysets.
I've found no evidence of support for the client-side persistance of the ObjectStateManager's entity state and updated and original member values in the Orcas March 2007 version of the EF.
Revised Conclusion
In my opinion, nNeither Andres nor Udi deal with totally disconnected operations, disconnected problems or such as OCS. The issue they're addressing is handling potential short-term concurrency conflicts in fully connected systems that have untracked objects.
The Shape of Things to Come
Most folks appear to be writing serial expositions on the same or related LINQ or EF topics, so consider this post to be the first member of the "Concurrency Quartet," with apologies to Lawrence Durrell.
Part II, Change Tracking in the Entity Framework and LINQ to SQL, offers my views about the change tracking and concurrency conflict issues Andres and Udi discuss. It also takes a look at the differences between EF's and LINQ to SQL's approaches to change tracking and concurrency management in the Orcas March 2007 CTP.
Part III, Local Data Caching for OCS with SQL Server Compact Edition, relates my experiences with substituting SSCE for DataSet XML Updategram files as a local data cache.
Part IV will cover issues relating to LINQ to SQL issues with SSCE in the Orcas March 2007 CTP and other trivia.
Technorati tags: Entity Framework, Entity Data Model, EDM