June 11, 2013 1 Comment
T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.
Everybody that ever got serious with SQL Server, knows about execution plans. And like Rob said in his opening post, if you don’t, start looking into that! But if you start working with execution plans, it gets confusing really fast. All those weird “building blocks” that tell you what SQL Server did with your query?! It almost looks like magic…!
But when you dive into it, they become easier to read and they suddenly are somewhat understandable. But hey, even after all those years looking at execution plans, some things still amaze me.
Last week I was asked to look at a query, that was used in an SSIS package. This package was one of the slowest from the nightly ETL processes. So I started of by looking at the execution plan:
Start pinpointing the first issue
Then I started digging, and I noticed this:
LEFT JOIN DB.dbo.T1 m ON m.Col1 = ISNULL(LOWER(t.Col1) COLLATE Latin1_General_CS_AS, '(N/A)') LEFT JOIN DB.dbo.T2 s ON s.Col2 = ISNULL(LOWER(t.Col2) COLLATE Latin1_General_CS_AS, '(N/A)') LEFT JOIN DB.dbo.T3 k ON k.Col3 = ISNULL(LOWER(t.Col3) COLLATE Latin1_General_CS_AS, '(N/A)')
But wait a minute… The collations we use aren’t Case Sensitive, so why use the LOWER() function anyway? And the collations are practically the same for the 2 databases used in the query (“SQL_Latin1_General_CP1_CI_AS” and “Latin1_General_CI_AS”). But the column collations are both the same (“SQL_Latin1_General_CP1_CI_AS”). SQL Server doesn’t throw an error if I join both sets, and the results aren’t different if I use COLLATE or not. So we learned another thing: the column collation is used in the ON clause of the JOIN. So both functions aren’t necessary because the collations are equal, and thus their wasting our precious CPU cycles…
Now here comes the moment you’ve all been waiting for. How does the execution plan differ? This is what the new execution plan looks like:
So what are we missing? You guessed right! We’re missing this bad boy:
So what does the “Compute Scalar” actually do? According to MSDN:
The Compute Scalar operator evaluates an expression to produce a computed scalar value
So that means it calculates a new value for every input column. This calculation can be as simple as a conversion of the value, or a concatenation of multiple values. Most of the time these “Compute Scalar” aren’t a major cost in your execution plan, but it can become an issue that uses a lot of CPU.
But what does this “Compute Scalar” do in this query? If I remove the ISNULL() from the query, it’s still there. It disappears if I remove the COLLATE. So in this case, the “Compute Scalar” is “calculating” a new value for every record. This “calculation” actually is a conversion from 1 collation to the other.
But what’s the difference?
If we compare both versions of the query, the modified version has a lot less impact on the server:
And this is only without the LOWER() and COLLATE functions. So with a little bit more effort, you can rewrite a part of the query, maybe add an index, etc. So as you can see, small changes can have a huge impact!
A “Compute Scalar” isn’t that hard to understand, but you’ll find it very often in your execution plans. So even though it isn’t the worst performance blocker in the world, it’s an interesting one to get familiar with.
If you don’t use SQL Sentry Plan Explorer yet, download your copy now!