Sunday, March 30, 2008

Minimize Stored Procedure Calls and Improve Performance with EF Extensions

Colin Meek's "Multiple Result Sets" topic of his ADO.Entity Framework: Stored Procedure Customization post of March 26, 2008 provides sample code for obtaining a Category entity and its Category.Products EntitySet from an object store based on the Northwind sample database.

Substituting the Entity Framework (EF) Extension's EntitySet objects for the ObjectResult type and taking advantage of Materializers and stored procedures (sprocs) that return multiple result sets greatly reduce the number of sproc calls and execution time for moderate-size object graphs.

Updated 5/31/2008: Table comparing execution times for local and remote SQL Server 2005 instances added.

Background

My Testing Stored Procedure Replacements for Entity SQL Statements post of March 28, 2008 makes 161 stored procedure (sproc) calls to populate 80 Product entities with simple type values plus Category and Supplier EntityReferences. Category and Supplier entities don't have their Products EntitySets populated, which would require another 37 calls for a total of 198 calls. Here's the sequence:

  1. 1 call to GetAllProducts populates the Products EntitySet (80 entities)
  2. 80 calls to GetCategoryForProductById to populate the Product.Category EntityReferences
  3. 80 calls to GetSupplierForProductById to populate the Product.Supplier EntityReferences
  4. 8 calls to GetProductsByCategoryId to populate the Category.Products EntitySet
  5. 29 calls to GetProductsBySupplierId to populate the Supplier.Products EntitySet

A major difference between LINQ to SQL and Entity Framework's entity mapping is that LINQ to SQL entities expose foreign key values, such as Product.CategoryID and Product.SupplierID, but EF hides them because they are a relational artifact. If these values were visible, it would be possible to reduce the number of calls to the total number of objects (80 + 8 + 37 = 117) and use client-side code to cache the objects and then construct and attach the EntitySets and EntityReferences for Associations.

Attempting to Reduce Sproc Calls with ObjectResult Types Cached to List<T>s

An potential approach to reducing the number of calls would be to start with Categories and Suppliers, rather than Products, because Categories or Suppliers can furnish the Product entities. Only 39 sproc calls would be required. Here's the drill:

  1. 1 call to GetAllCategories to cache Category entities
  2. 8 calls to GetProductsForCategoryId to populate Category.Products EntitySets, which contain all Product entities
  3. 1 call to GetAllSuppliers to cache Category entities
  4. 29 calls to GetProductsForSupplierId to populate Supplier.Products EntitySets, which also contain all Product entities
  5. Cache a Products list from the Category.Products EntitySets
  6. Sort the Products list in ProductID order
  7. Attempt to Attach the appropriate Product.CategoryReference EntityReference from a Category entity whose Category.Products EntitySet contains the Product
  8. Attempt to Attach the appropriate Product.SupplierReference EntityReference from a Supplier entity whose Suppliers.Products EntitySet contains the Product

Here's the code for the preceding steps:

// Get Category entities
catList = ctxNwind.GetAllCategories().ToList();
foreach (Category cat in catList)
{
    foreach (Product prod in ctxNwind.GetProductsByCategoryID(cat.CategoryID))
        cat.Products.Add(prod);
    
    prodList.AddRange(cat.Products);
}

// Get Supplier entities
vendList = ctxNwind.GetAllSuppliers().ToList();
foreach (Supplier vend in vendList)
{
    foreach (Product prod in ctxNwind.GetProductsBySupplierId(vend.SupplierID))
        vend.Products.Add(prod);
}

// Sort prodList by ProductID 
prodList.Sort(delegate(Product p1, Product p2)
    { return p1.ProductID.CompareTo(p2.ProductID); });

// Get Catetory and Supplier EntityReferences
foreach (Product prod in prodList)
{
    foreach (Category cat in catList)
    {
        if (cat.Products.Contains(prod))
        {
            // Type conflict between Category cat and required CategoryReference
            // Nocando: prod.CategoryReference.Attach(cat);
            prod.Category = cat;
            break;
        }
    }
    foreach (Supplier vend in vendList)
    {
        if (vend.Products.Contains(prod))
        {
            // Type conflict between Supplier vend and required SupplierReference
            // Nocando: prod.SupplierReference.Attach(vend);
            prod.Supplier = vend;
            break;
        }
    }
}

Here's the updated test harness displaying 0 Relationships as the result of missing CategoryReference and SupplierReference EntityReferences.

Note: Observe the speedy execution of the Get Product button handler when no EntityReferences are attached as Relationships.

Unfortunately, you can't attach a Category or Product type as a CategoryReference or ProductReference, so you're stuck with executing

prod.CategoryReference
    .Attach(ctxNwind.GetCategoryForProductById(prod.ProductID).First());
prod.SupplierReference
    .Attach(ctxNwind.GetSupplierForProductById(prod.ProductID).First());

in the foreach (Product prod in prodList) loop as described in the "Retrieving Products and Attaching Category and Supplier EntityReferences" topic of the previous post. This adds 160 additional sproc calls for a total of 199.

Here's the test harness after attaching the EntityReferences:

Note: Execution time is almost 4x that for no relationships, so it's obvious that attaching EntityReferences for relationships under these condistions is a heavyweight process. However, the original code with relationships runs in 0.083 seconds, so this entire approach is suboptimal with conventional ObjectResult types, at best.

Solving the Problem with EF Extensions' Materializers, EntitySets and Multiple Result Sets

The following code performs the same operations as the preceding example but calls GetCategoryAndRelatedProducts() and GetProductAndRelatedProducts() methods, which in turn call sprocs that return multiple resultsets. The methods use Materializers with the default "shaper" delegate to return Product, Category, and Supplier types and substitute ObjectContext.EntitySet types for ObjectContext.ObjectResult types.

// Keep connection open for all operations
DbCommand command = ctxNwind.CreateStoreCommand("usp_GetCategoryAndProductsByID",
    CommandType.StoredProcedure, new SqlParameter("CategoryID", 0));
using (command.Connection.CreateConnectionScope())
{
    // Get Categories and Products EntitySets, cache Categories
    // Production version would return a list of CategoryIDs with an sproc
    for (int i = 1; i <= 8; i++)
    {
        catList.Add(GetCategoryAndRelatedProducts(i, command));

        //Add Products in Category to Product list
        prodList.AddRange(catList.Last().Products);
    }

    // Get Suppliers and Products EntitySets, cache Suppliers
    command = ctxNwind.CreateStoreCommand("usp_GetSupplierAndProductsByID",
        CommandType.StoredProcedure, new SqlParameter("SupplierID", 0));
    for (int i = 1; i <= 29; i++)
    {
        vendList.Add(GetSupplierAndRelatedProducts(i, command));
    }
    // Total of 37 sprocs (would be 39 in production)
}

// Sort prodList by ProductID 
prodList.Sort(delegate(Product p1, Product p2) 
    { return p1.ProductID.CompareTo(p2.ProductID); });

// Iterate Products in ProductID order
foreach (Product prod in prodList)
{
    // Get Supplier for each Product
    foreach (Supplier vend in vendList)
    {
        if (vend.Products.Contains(prod))
        {
            prod.Supplier = vend;
            break;
        }
    }
}

The magic that handles attaching the Products.Category and Product.Supplier properties as EntityReferences appears to be in the EntitySet type and the Bind() method.

private static Materializer<Product> s_prodMaterializer = 
    new Materializer<Product>();
private static Materializer<Category> s_catMaterializer = 
    new Materializer<Category>();
private static Materializer<Supplier>; s_vendMaterializer = 
    new Materializer<Supplier>();

// Modified version of Colin Meek's original code for this method
public Category GetCategoryAndRelatedProducts(int categoryID, DbCommand command)
{
    Category category;
    command.Parameters[0].Value = categoryID;
    using (DbDataReader reader = command.ExecuteReader())
    {
        // First result set includes the Category
        EntitySet<Product> ProductSet = 
            new EntitySet<Product>(ctxNwind, "Products");
        EntitySet<Category> CategorySet = 
            new EntitySet<Category>;(ctxNwind, "Categories");

        category = s_catMaterializer
            .Materialize(reader)
            .Bind(CategorySet)
            .SingleOrDefault();
        // Second result set includes the related Products
        if (null != category && reader.NextResult())
        {
            category.Products.Attach(s_prodMaterializer
                .Materialize(reader)
                .Bind(ProductSet));
        }
    }
    return category;
}

// Clone of preceding method for suppliers
public Supplier GetSupplierAndRelatedProducts(int supplierID, DbCommand command)
{
    Supplier supplier;
    command.Parameters[0].Value = supplierID;
    using (DbDataReader reader = command.ExecuteReader())
    {
        // First result set includes the Supplier
        EntitySet<Supplier> SupplierSet = 
            new EntitySet<Supplier>(ctxNwind, "Suppliers");
        EntitySet<Product> ProductSet = 
            new EntitySet<Product>(ctxNwind, "Products");

        supplier = s_vendMaterializer
            .Materialize(reader)
            .Bind(SupplierSet)
            .SingleOrDefault();
        // Second result set includes the related Products
        if (null != supplier && reader.NextResult())
        {
            supplier.Products.Attach(s_prodMaterializer
                .Materialize(reader)
                .Bind(ProductSet));
        }
    }
    return supplier;
}

Note: The preceding methods have been modified to utilize a single Command and ConnectionScope object for all sproc calls in the sequence.

Colin says the following about the EntitySet class:

To facilitate tracking, EFExtensions includes an EntitySet class which manages activities related to Entity Data Model (EDM) entity sets. The class includes a FindOrAttach method which either attaches the given entity to the state manager for tracking purposes or returns an existing entity with the same key. This simulates the identity resolution behavior that the EF provides when running queries.

And this about the Bind() method:

For convenience, a Bind extension method that attaches a sequence of entities is also included.

The upshot is that the code executes only 37 sprocs (39 in production) and runs in 0.033 seconds, which is 2.5 times faster than the 0.083 seconds of the original version with the database on the test machine.

The following table compares the execution times for a local and networked instance of SQL Server 2005 Express for 80 products for four test conditions:

Test Conditions

Calls

Local

Network

Original version (with Entity.Products EntitySets added)

161

0.083 s.

0.255 s.

Categories and Suppliers first (no relationships created)

37

0.043 s.

0.060 s.

Categories and Suppliers first (160 relationships created)

197

0.169 s.

0.290 s.

Final Categories and Products first (with EF Extensions)

37

0.033 s.

0.059 s.

Here's the proof that the code calls only 37 sprocs (1 is for opening the connection):

And there's no problem with adding, editing, and removing entities.

Note: Julie Lerman observes in her EFExtensions and stored procs with shaped results post of March 30, 2008:

What Colin's post shows, however is sprocs that return shaped results. A standard function import can't do this...it needs to map to a single entity type.

I believe that multiple result set capabilities, freedom from the necessity to declare FunctionImports, fewer stored procedure calls for object graphs of similar complexity, and what appears to be a significant performance boost will be the most interesting features of the EF Extensions.