Wednesday, June 27, 2007

Rico Mariani's DLinq Performance Tips (Part 2) and Compiled Queries

Rico Mariani continues his analysis of LINQ to SQL performance issues with a new DLinq (Linq to SQL) Performance (Part 2) post dated June 25, 2007 by recommending compiled LINQ query expressions.

LINQ to SQL query expressions take advantage of lazy evaluation. Your code doesn't hit the SQL Server instance until the the IEnumerable<T> sequence's iterator (foreach or For Each structure) requests the first row. However, before the iterator starts it's pass, the compiler builds an expression tree, which in turn generates the T-SQL statement, and sends it to the database. As Rico observes:

In many cases all that will be different from one invocation to another is a single integer filtering parameter.

The expression tree incorporates logic to send a parameterized query when the query expression contains one or more parameters in the Where clause. For example:

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[ShippedDate]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[ShipCountry] = @p0
ORDER BY [t0].[OrderID] DESC
-- @p0: Input NVarChar (Size = 3; Prec = 0; Scale = 0) NOT NULL [USA]

SQL Server caches the query plan and uses the cached version for different parameter values. Thus the only option remaining to the developer for improving performance is to compile the query, which eliminates building the expression tree and generating the T-SQL on each execution. 

Compiling Parameterized LINQ Query Expressions

Rico provides an example of a query delegate type that actually will compile. The only item missing is the delegate function:

private IEnumerable<Order> GetOrderById(int orderId)
    return q(Northwinds, orderId);

The LINQ to SQL: .NET Language-Integrated Query for Relational Data white paper by Dinesh Kulkarni, Luca Bolognese, Matt Warren, Anders Hejlsberg, and Kit George (March 2007) is one of the few (about 7) hits you'll get on a search for compiledquery.compile. The white paper has a very brief section that covers the the CompiledQuery class and its Compile method for compiling parameterized query expressions with a single example that will compile but won't work more than once during a session.

Note: The paucity of posts about compiled LINQ queries is surprising.

A bug—presumably in a System.Data.Linq class—causes a null reference exception if you create new DataContext objects for successive executions of the compiled query during a session. The sample code for the function that invokes the query delegate type does exactly that (and has a variable name error, to boot):

public IEnumerable<Customer> GetCustomersByCity(string city)
{
   Northwind db = new Northwind();
   return Queries.CustomersByCity(myDb, city);
}

Delete the Northwind db = new Northwind(); instruction, change the erroneous  myDB argument to your DataContext variable and the sample code should work as expected. However, this is another example of untested sample code. The authors might not have found the multiple invocations bug, but certainly wouldn't have been able to compile the function with the wrong argument name.

Note: Anders Borum confirmed the bug and pointed out the source of the problem in an answer to my Problem with Reuse of Compiled LINQ to SQL Query in Beta 1 post in the LINQ Project General forum. Matt Warren confirmed that the bug was detected and fixed in the Beta 1 timeframe but didn't make it into the Beta 1 bits. It's fixed for the forthcoming Beta 2 drop.

Where's the Meat?

Rico didn't offer any performance numbers in his Part 2 post, possibly because the missing function prevented him from comparing the execution time of his compiled query with baseline SqlDataReader data. VB 9.0 doesn't support lambda expressions in VS 2008 Beta 1, so I ported my original VB test harness to C# 3.0, added code to generate a combo box of Northwind country names with the number of orders per country, and wrote the code to add a parameterized query option for LINQ and SqlDataReader queries, and a compiled query option for LINQ. Here's a screen capture of the test harness's form:

The combo box contains the name and count of orders from 21 distinct ShipCountry values from the Northwind Orders table. The count varies from 6 (Norway) to 122 (USA and Germany.) The  DataContext's Orders object has only the fields required for the query expression (the three fields of the original test harness and Rico's first test object) plus the ShipCountry field for the Where clause constraint.

Following is the performance penalty data for executing 500 parameterized and compiled queries with differing numbers of rows:


Rows

Base, s.
LINQ (Param), s. Param / Base LINQ (Compiled), s. Compiled / Base Param / Compiled
0 0.293 1.941 6.63 0.589 2.01 3.30
6 0.317 2.051 6.47 0.664 2.09 3.08
22 0.340 2.299 6.76 0.803 2.36 2.86
77 0.429 3.164 7.38 1.273 2.96 2.49
122 0.507 3.894 7.68 1.684 3.32 2.31

Base is the parameterized SqlDataReader execution time in seconds. Param/Base is the ratio of parameterized LINQ query execution time to the Base time. Compiled/Base is the ratio of compiled, parameterized LINQ query execution time to the Base time. Param/Compiled is the performance ratio of compiled-parameterized to parameterized LINQ queries.

Data: Rows contain 36 bytes of date: 2 integers (8 bytes), 1 5-character CustomerID (10 bytes), one DateTime field (8 bytes), and an nvarchar ShipCountry field that averages 5 characters (10 bytes) for a total of 36 bytes. The rows of the Part 1 tests contained an average of 26 bytes (no ShipCountry field).

Hardware: 2005-era, single core 2.26-GHz Pentium 4 Dell 400SC server with 1 GB RAM and 80-GB 7,200 RPM UltraATA-100 drive (IBM IC35L090AVV207-0, 2MB buffer, 8.8 ms. seek time).

Conclusions: With Beta 1 bits, LINQ to SQL parameterized queries are about 7 times slower than the baseline SqlDataReader; LINQ to SQL compiled queries are about 2.5 times slower than the baseline SqlDataReader. The LINQ to SQL query performance penalty increases for both types of LINQ to SQL queries as the number of rows returned increases. The performance benefit of compiled over parameterized LINQ queries decreases as the number of rows increases.

Note: The the 330% performance factor of compiled queries in this scenario doesn't match the 500% performance factor reported by Anders Borum, probably because of the masking effect of database access.

What About Compiled Query Caching?

Frans Bouma, the developer of the LLBLGen Pro object/relational mapping (O/RM) tool for .NET, raises in a comment the issue of compiled query caching between DataContext refreshes. My assumption is that the compiled query is cached independently of the DataContext, just as the database connection is independent of the DataContext and is closed once the object(s) of interest have been hydrated.

Matt Warren says in a reply to the Problem with Reuse of Compiled LINQ to SQL Query in Beta 1 post:

We kind of knew we needed something like compiled queries before we even started seriously looking at perf.  We already had experience with ObjectSpaces to tell us that an ORM would have translation overhead.  Usually that overhead is not a big deal for a client-app, but we figured we could do better on server apps where the same query gets executed over and over again.

Due to the Beta 1 bug, I can't test inter-DataContext caching currently. I'll see what I can find out and update the post later.

Jomo Fisher, a member of the LINQ to SQL team, subsequently confirmed my assumption about DataContext-independent compiled queries in a June 28. 2007 comment:

Compiled queries are not bound to a particular [DataContext]. An intended usage is to compile a query once and use it across many DataContexts.

Where's Beta 2?

I'm seeing increasing numbers of "beta 2 fixes that," "beta 2 is faster," "wait for beta 2" and "it's better in beta 2" comments from the LINQ to ADO.NET folks. Rico's added his comment to the first post in his series:

Some things made it into Beta 1 but the bulk of what I'm going to post in the next few days didn't happen until after. You'll first see it in Beta 2. [Emphasis added.]

and Matt Warren seconds Rico's motion in this comment to his LINQ to SQL: Learning to Crawl post about Rico's item: 

Major improvements coming. :-)

 Let's hope bug fixes are coming, too.

Mini-Connectionless DataContext Is Gone from VS 2008

The "mini-connectionless DataContext" that Matt Warren described in his April 12, 2007 post to the Update existing disconnected objects (dlinq) thread in the LINQ Project General forum won't be in VS 2008's LINQ to SQL implementation. Matt says in June 18 and 27, 2007 posts to the LINQ: Next CTP thread:

No, this feature will not be in the first release of LINQ to SQL.

It is only missing because of lack of time.  When I mentioned it, I was talking about our forward thinking and a sample I'm trying to put together.

It will be interesting to see how Matt proposes to run LINQ to SQL v1.0 in the middle tier as a WCF service. I expected to see this feature in Beta 2, so I've updated my Changes Coming to LINQ for SQL post of May 15, 2007 with the bad news.

Update 6/28/2007: Fixed major formatting problems caused by <pre> elements, minor typos, and added sections on caching and the missing mini-connectionless DataContext.

Update 6/29/2007: Matt Warren said Rico suspects I have a faster disk drive, so I corrected and updated the drive specs on the earlier post and added a hardware section here. Clarified the row size (in bytes) and DataContext specs, also.

7 comments:

Anonymous said...

I haven't looked at Orcas beta 1 yet, so forgive me if the question I'm going to ask is stupid.

A compiled query, isn't that query only valid within a live, created DataContext instance? If so, using compiled queries for this test is cheating.

the thing is this: When you compile this query, you can also keep a connection open and call Prepare on the command, and get very high performance.

However in normal scenario's, it's not likely you keep a datacontext around forever, unless you have a single user desktop app. (and even then).

So in scenario's where you have to create a new datacontext each time you want to access the db, you lose the compiled query. Am I correct in this?

If the compiled query is cached in some static cache available to all datacontext instances... no problem. If not, then saying that compiled queries are the solution isn't going to cut it.

Anonymous said...

Thanks for the post and for pointing out the issue with the sample code in the msdn article. I've opened a bug to get it fixed.

Just a note: Compiled queries are not bound to a particular datacontext. An intended usage is to compile a query once and use it across many DataContexts.

Jomo Fisher
LINQ to SQL Team

Anonymous said...

Thanks Jomo, that is good news :)

Anonymous said...

Hi!

Have to admit that was one heck of a cliffhanger by Matt and Rico. Hopefully we'll get the real numbers within a few days.

I'd like to comment on the reported 500% performance increase. As reported by Rico (and indicative of your own findings), the performance gain using compiled queries decrease as the number of rows in the result sets increase.

The performance test I established only benchmarks queries against single entities (i.e. SingleOrDefault() results) vs. the corresponding SqlDataReader implementation (referred to as the raw cost in the blog series by Rico and Matt).

According to my findings, there were indeed ~500% performance increase from switching from regular inline to compiled queries - again, benchmarkes against single entities. When beta 2 arrives, I'll update the suite of tests to include sets also.

Because the API I'm working on is heavy on requests for single entities, I guess I just focused on that part (knowing that you guys would be me to the other tests :-)).

Frans Bouma:
Yea, compiled queries are intended as long lived members (I keep my queries in statics hidded within the internals of APIs).

I'm a strong believer in optimization and have already started evangelizing about the ups and downs of LINQ to SQL.

Could be interesting to see how similar OR/Ms compare in performance .. are you up to the task? :-)

Anonymous said...

That was btw. Anders Borum that wrote the preceding comment. Have to get some account established ..

Roger Jennings (--rj) said...

Hi, Anders,

Not surprisingly, the performance boost also decreases as the number of bytes per row increases. The previous tests were with 36-byte rows. I'm about to post some results with 158-byte (average) rows.

It will be interesting to see how Beta 2 enhancements alter the results.

--rj

Anonymous said...

qcsglll
车库门
卷帘门
伸缩门
翻译公司
道闸
挡车器
伸缩门
货架
货架
货架
货架
货架
电动伸缩门
车库门
自动车库门
防火卷帘门
阳光房
除湿机
抽湿机
比歘情



电动伸缩门
伸缩门
卷帘门
车库门
道闸
挡车器
岗亭
道闸
伸缩门
自动门
车库门
岗亭
日上
日上防盗门
电动伸缩门
卷帘门
伸缩门


自动车库门
车库门
伸缩门
自动门
电动门
伸缩门
卷帘门
车库门
电动伸缩门
道闸
岗亭
自动车库门
电动伸缩门
岗亭
卷帘门
车库门
伸缩门
电动门
自动门



卷帘门
电动门
自动门
卷帘门
道闸
车库门
伸缩门
岗亭
岗亭
岗亭
岗亭
岗亭
岗亭
车库门
道闸
卷帘门
防火卷帘门
电动伸缩门
道闸


防火卷帘门
自动车库门
车库门
伸缩门
电动伸缩门
挡车器
北京SEO
专业SEO
专业SEO
专业SEO
锅炉安装
Oracle数据库容灾复制
Oracle容灾备份
Oracle备份恢复
北京铁艺





bgwldwzjzzxh!!
妇科
六合彩
热门图库
热门图库精