T-SQL Tuesday #43 – Hello, Operator?

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Hello, Operator?”. If you want to read the opening post, please click the image below to go to the party-starter: Rob Farley (Blog | @rob_farley).



 
Execution Plan
Everybody that ever got serious with SQL Server, knows about execution plans. And like Rob said in his opening post, if you don’t, start looking into that! But if you start working with execution plans, it gets confusing really fast. All those weird “building blocks” that tell you what SQL Server did with your query?! It almost looks like magic…!

But when you dive into it, they become easier to read and they suddenly are somewhat understandable. But hey, even after all those years looking at execution plans, some things still amaze me.

Last week I was asked to look at a query, that was used in an SSIS package. This package was one of the slowest from the nightly ETL processes. So I started of by looking at the execution plan:

 
Start pinpointing the first issue
Then I started digging, and I noticed this:

LEFT JOIN DB.dbo.T1 m
	ON m.Col1 = ISNULL(LOWER(t.Col1) COLLATE Latin1_General_CS_AS, '(N/A)')
LEFT JOIN DB.dbo.T2 s
	ON s.Col2 = ISNULL(LOWER(t.Col2) COLLATE Latin1_General_CS_AS, '(N/A)')
LEFT JOIN DB.dbo.T3 k
	ON k.Col3 = ISNULL(LOWER(t.Col3) COLLATE Latin1_General_CS_AS, '(N/A)')

 
But wait a minute… The collations we use aren’t Case Sensitive, so why use the LOWER() function anyway? And the collations are practically the same for the 2 databases used in the query (“SQL_Latin1_General_CP1_CI_AS” and “Latin1_General_CI_AS”). But the column collations are both the same (“SQL_Latin1_General_CP1_CI_AS”). SQL Server doesn’t throw an error if I join both sets, and the results aren’t different if I use COLLATE or not. So we learned another thing: the column collation is used in the ON clause of the JOIN. So both functions aren’t necessary because the collations are equal, and thus their wasting our precious CPU cycles…

Now here comes the moment you’ve all been waiting for. How does the execution plan differ? This is what the new execution plan looks like:

So what are we missing? You guessed right! We’re missing this bad boy:

So what does the “Compute Scalar” actually do? According to MSDN:

 

The Compute Scalar operator evaluates an expression to produce a computed scalar value

So that means it calculates a new value for every input column. This calculation can be as simple as a conversion of the value, or a concatenation of multiple values. Most of the time these “Compute Scalar” aren’t a major cost in your execution plan, but it can become an issue that uses a lot of CPU.

But what does this “Compute Scalar” do in this query? If I remove the ISNULL() from the query, it’s still there. It disappears if I remove the COLLATE. So in this case, the “Compute Scalar” is “calculating” a new value for every record. This “calculation” actually is a conversion from 1 collation to the other.

 
But what’s the difference?
If we compare both versions of the query, the modified version has a lot less impact on the server:

And this is only without the LOWER() and COLLATE functions. So with a little bit more effort, you can rewrite a part of the query, maybe add an index, etc. So as you can see, small changes can have a huge impact!

 
Conclusion
A “Compute Scalar” isn’t that hard to understand, but you’ll find it very often in your execution plans. So even though it isn’t the worst performance blocker in the world, it’s an interesting one to get familiar with.

If you don’t use SQL Sentry Plan Explorer yet, download your copy now!

T-SQL Tuesday #42 – The Long and Winding Road

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “The Long and Winding Road”. If you want to read the opening post, please click the image below to go to the party-starter: Wendy Pastrick (Blog | @wendy_dance).



After I skipped the last few T-SQL Tuesday posts, I saw the topic Wendy suggested for this month. That’s an interesting one, so I’m kind of back again…

One particular sentences that stuck with me, was “Let’s make these stories about the tech and how that has led you down a certain path”. A few months ago, I blogged about leaving my former employer. There I got a change to discover a bunch of new technologies and discover what I really wanted. But the path started earlier.

During my first internship, I started working as a desktop support engineer and help desk employee. This meant I needed to install, replace and fix PC’s, phones, network, etc. After the first internship (6 months), I stayed with the company as a part-time employee. Then I decided to stay there for my second internship of 6 months, which I filled working as a system administrator.

When I was still busy with exams and finishing my education, the company asked me to stay with them. They had an interesting role in mind for me: SQL Server consultant. At that time, I’ve never wrote a single query (except a few trials in my database classes at school), but I accepted the offer. The learning curve was pretty steep, but I managed with the help of my wonderful colleagues. My first experiences with SQL Server were very interesting, and it really triggered me to learn more about it, even though it was SQL Server 2000 back then.

After 2 years I got the opportunity to start as a .NET developer within the same company. Because several colleagues left, there was a high demand for software engineers. So with no knowledge of ASP and VB, I started to discover this new world. After a few months, I moved from ASP to ASP.NET, from VB to C#, and eventually worked with WCF, Silverlight, NHibernate, etc.

After 2.5 years of being a software engineer, I came back to the database world. I can’t explain why, but I missed something in my daily job that I had before. When searching for a new job, I found a company that seemed to fit my ideas of a great company and a huge challenge: multinational, young team, great colleagues, lots of learning opportunities, etc. There I started out as a BI- and SQL Server developer, and I found what I was missing in my previous job: data!

One thing I’ve learned after this wide variety of jobs, is that I love databases and data. In my current job of DBA I can combine the 2 things I love the most: data and technology. As a DBA I’m not only responsible for keeping the databases alive, but also for data quality, database performance, etc.

So this is the ultimate job for me, at least at the moment. But if I ever change jobs, I’m definitely not leaving the SQL Server world! Not only because I love working with databases and data, but also because of the wonderful community. At first I didn’t know what to think about the “SQL Family”, but now I know it practically IS a family. The people have the same issues as you, the same interests, and they love to help you out. So I’m never going to leave that behind again!

Moving system database: rocket science or piece of cake?

Being a DBA often makes you the “Default Blame Acceptor”, according to Buck Woody (Website | @buckwoody). This means that everything is your fault by default. Server broke down? Your fault! Database corrupt? Your fault! Query of a user doesn’t compile because of a syntax error? Yeah, you guessed right… Your fault!

But on the other hand, you have a lot of opportunities to find out the best practices of doing things. An example of that is moving a system database. About two weeks ago we decided to order 4 SSD’s for our SQL Server. We plan to store tempdb and the SSAS data on these disks, hoping that it will reduce resource costs on our environment.

So with no experience of moving system databases, I started thinking about how to do this. You probably need to stop the SQL Server, move the MDF and LDF files, change the start-up options of SQL Server, start the service, hope that SQL Server finds the new location, etc. But after a quick peek I found a much simpler solution: just modify the current file location!

 
Check the current location and file sizes
Before moving your database (in this case I’m moving my tempdb), run the query below, and store the result just in case all goes south:

SELECT name, physical_name, state_desc, (size * 8 / 1024.00) AS InitialSize
FROM sys.master_files
WHERE database_id = DB_ID('tempdb');

 
The reason you also want to store the initial sizes, is that if you restart the SQL Service (one of the next steps), SQL Server will set the files to the default file sizes. And you don’t want to run on those default settings of course!

 
Set the new file location
You can set the new file location for your tempdb, by running the query below. In this example I’m moving my datafiles to the D:\ volume of my machine:

USE master
GO

ALTER DATABASE tempdb 
	MODIFY FILE (NAME = tempdev, FILENAME = 'D:\MSSQL\DATA\tempdb.mdf')
GO
ALTER DATABASE tempdb 
	MODIFY FILE (NAME = templog, FILENAME = 'D:\LOG\templog.ldf')
GO

 
After executing this statement, you’ll see a message like this appear in the Messages window:

 

The file “tempdev” has been modified in the system catalog. The new path will be used the next time the database is started.
The file “templog” has been modified in the system catalog. The new path will be used the next time the database is started.

 
So the file location is altered, but the running values are not changed until your machine is rebooted, or the SQL Service is restarted.

Now just restart the SQL Service (or the machine if you like to), and run the first query again. This way you can check if your tempdb is stored in the right folder, and if the initial sizes are correct:

SELECT name, physical_name, state_desc, (size * 8 / 1024.00) AS InitialSize
FROM sys.master_files
WHERE database_id = DB_ID('tempdb');

 
The service should stop and start without issues. After the restart you’ll see that SQL Server created a new MDF and LDF file at the new file location. After a successful restart, you can delete the MDF and LDF files from the old location.

 
Now, was that so hard?
So as you can see, not all changes in SQL Server are rocket science. One thing I’ve learned, is that from now on, I’m not going to assume the worst, and hope for the best!

 
UPDATE
As Pieter Vanhove (Blog | @Pieter_Vanhove) mentions in his tweets about msdb/model and master, in some cases you need to do a little bit more work. Because the tempdb is the database with the least probable cause of breaking SQL Server (it’s recreated if the SQL service starts), you can move it by changing the settings.

On the restart, the “Running values” (current settings) are overwritten by the “Configured values” (new settings) you set with the query you ran, and tempdb is recreated.

But the other system databases require a little bit more effort. If you want to move master, model or msdb, check out this link.

And thank you Pieter, for pointing out this stupid mishap to me!

Deadlock Detector: Drop it like it’s hot!

Last time I talked about the right tool for the right job. After that post, I noticed that during the nightly backups and job-runs, the SQL Server log filled up with deadlocks. Hoping to find the cause of this issue, I installed SQL Solutions Deadlock Detector. But I must warn you: it’s easier to install it, then to get rid of it!

 
Downloading it
The drama starts with downloading the tool. If you go to the download location of the tool, there’s no mention of a trial-version. They do like to tell you that it’s free… A lot of times… But guess what… It isn’t!

 
Installing it
After downloading the tool, I started the installation. Again, there’s no mention of any kind that I’ve just downloaded a trial version, and that you need to pay for the tool after 7 days.

 
Use it
After the installation I started the tool. The tool shows you for the first time that you’re actually running a triall:

After I clicked on “Try”, I was prompted to enter a SQL instance to run the tool on. After giving it my SQL Server instance location and name, it showed a pop-up really quick. Apparently it installed a bunch of crap on my SQL instance! Again, there is NO mention that it needs to install objects in your msdb! Also, SQL Server Agent needs to run on your instance, because Deadlock Detector can’t run without that.

So what does Deadlock Detector install? I’ve created a screenshot of all the objects that are installed:

 
Uninstalling it
Okay, the tool wasn’t what I was looking for, so I wanted to uninstall it. But how do I get rid of all those objects that were created in my msdb database?

First I uninstalled the tool, hoping that it would also drop all the objects it created. But at the end of the uninstall, it sent me to this webpage. So I’m suppost to download a special uninstaller to get rid of your installer?

Okay, so I downloaded the special installer-uninstaller, and ran it. Close, but no cigar… The tool ran, returned no errors, but the objects still existed. Running it again, ended with the same results.

To check it yourself, run this query:

SELECT *
FROM msdb.LakeSideLockLogger._LakeSide_DbTools_LockLog

If it’s still running you’ll see record appear that tell you that there were deadlocks on your system. Even though you closed the tool and, like me, expect it to shutdown, the services still run.

To check if the objects still exist on your server, run this query:

SELECT *
FROM msdb.sys.objects
WHERE 1 = 1
AND is_ms_shipped = 0
AND Name LIKE '_LakeSide%'

So what do you do then? Right, I sent the company a support call. I’ve waited more than a week by now, but still no reply. Not even a reply that they’ve received my mail. So a really, really bad service of SQL Solutions if you ask me…

So after a lot of searching, I saw a really small reference to another uninstall method: from the tool itself. So I’ve re-installed the tool, and tried that:

And that worked for me!

 
Conclusion
Apparently more people had issues with uninstalling the tool, looking at results like this.

The thing I learned from this, is to dig a little bit deeper if I look at a new tool. The website of the tool’s manufacturer might not be the best source to find out how a tools functions…

 
UPDATE
In the mean while, I found some more objects that aren’t uninstalled by Deadlock Detector: Operators and Alerts. Even though you’ve “uninstalled” the tool, objects, queues, etc, there are still some leftovers: 1 operator, and 3 alerts.

You can drop them by running the script below:

--Drop Operator
EXEC msdb.dbo.sp_delete_operator @name=N'Deadlock Detector – Default operator'
GO

--Drop Alerts
EXEC msdb.dbo.sp_delete_alert @name=N'Deadlock Detector – Process is killed'
GO

EXEC msdb.dbo.sp_delete_alert @name=N'Deadlock Detector – Lock is detected'
GO

EXEC msdb.dbo.sp_delete_alert @name=N'Deadlock Detector – Deadlock is detected'
GO

T-SQL Tuesday #39 – Can you shell what the PoSH is Cooking?

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Can you shell what the PoSH is Cooking?”. If you want to read the opening post, please click the image below to go to the party-starter: Wayne Sheffield (Blog | @DBAWayne).



A few months ago I attended a session of Jeff Wouters (Blog | @JeffWouters) about PowerShell. He talked about how powerful PowerShell really is, and showed us some examples. This was my first encounter with the “commandline on steroids”. Back then I didn’t think I’d use that any time soon, but this is changing fast! The wide variety of possibilities that PowerShell offers made me curious, but I never found the time and use for it. Until last week…

A great number of things have changed the last few weeks, and that gave me the opportunity to play around with PowerShell. Starting with reading a tutorial and creating my first few trial scripts, it soon hit me that it isn’t that easy. But on the other side, if you succeed at writing a script, it’s very powerful!

PowerShell ISE
One of the very first things I tried after staring the PowerShell ISE (the “studio” in which you can write your scripts) was trying to find all objects in my database. The first thing you need to do is:

#Set the Execution-Policy to Unrestricted
Set-ExecutionPolicy Unrestricted

#Run the Import-Module below once, so you can use the SQLSERVER functionality
Import-Module SQLPS

This will allow you to access your SQL Server objects. After that, you can start with the easy stuff like:

dir SQLSERVER:\sql\localhost\SQL2012\databases\Sandbox\Tables

dir SQLSERVER:\sql\localhost\SQL2012\databases\Sandbox\Views

dir SQLSERVER:\sql\localhost\SQL2012\databases\Sandbox\StoredProcedures

dir SQLSERVER:sql\localhost\SQL2012\databases\Sandbox\Users

This is very basic, and real easy to find out how this works. From Powershell, the results will look like this:

And from the “PowerShell ISE”, it will look like this:

Another thing I tried is to automatically create a database on my local server:

[void] [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.ConnectionInfo") 
[void] [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO") 

$serverInstance = "(local)\SQL2012"
$conn = New-Object -TypeName Microsoft.SqlServer.Management.Common.ServerConnection -ArgumentList $serverInstance 
$server = New-Object -TypeName Microsoft.SqlServer.Management.Smo.Server -ArgumentList $conn

$newdb = New-Object -TypeName Microsoft.SqlServer.Management.Smo.Database($server, "PowerShellToTheRescue")
$newdb.Create()

This script will create a new database called “PowerShellToTheRescue” on your local SQL Server. I assume it’s also possible to create tables in the database, but I haven’t found the time to test that.

PowerShell trough SQL Server Management Studio (SSMS)
Another option is to start PowerShell via a menu in SSMS. This means PowerShell is starting with the current database as default context. So then you don’t have to use the “Import-Module”, and you can start running your query right away. You can find this option by right-clicking your database, and choose “Start PowerShell”:

So, what are you saying?!
This months T-SQL Tuesday post isn’t that impressive, I know! But what I’m trying to say is that if you try PowerShell, you might find that it’s quite powerful and fun.

And thinking about it, I guess you can expect some more PowerShell posts from me in the next few weeks! ;)

Starting with Spatial Data – Enriching your dataset

The biggest issue when you start working with spatial data, is asking yourself the question: “What do I want to visualize?”. Do you want to visualize the areas your clients come from? Or do you want to calculate in which areas your new potential clients live? The next hurdle to take, is thinking about how to enrich the data you have now.

The easy way out
Almost every company has a accessible database with client information. The easy way out is to download a free spatial data set (like described in an earlier post), and visualize your data by joining your data set on the downloaded data set, based on country-, province-, or city name. This way, you don’t alter your data, but you’re still able to visualize your data.

Enriching your data
The way to go (in my opinion), is to permanently enrich your data. In that case, you’re more flexible in visualizing your data. Another advantage is that you can always group your data in the same way as mentioned above. If you want to group it by country, province or city, you can still do it. But remember: you can always group your data, but mining your data is difficult without accurate coordinates. And there is a free option for enriching your data, if you don’t want to spend a lot of money and time on it.

Last year, I spend a lot of time working with spatial data. One of the biggest brainteasers was how to enrich our data set, without actually disrupting the normal running processes. You don’t want to change your underlying data set, and then find out you need to rewrite all your software.

No seriously, it’s free…
One of the options I thought of, was converting a postal code, or zip code for my American friends (Seriously USA?! Do you have a different word for EVERYTHING?! :) ). The conversion can’t be done by number crunching, but needs to be done by looking it up. One of the options is to look up every postal code by hand. But if you have a few million addresses in your database, that’s not really an option.

One of the options I came across was the free Google Geocoding API. This allows you to translate a postal code into a Latitude and Longitude. And these coordinates can be plotted on a map.

And if you have a Latitude and Longitude, you can use Reverse Geocoding to translate your coordinates into a postal code.

A quick example of this, is a tool I’ve written. It translates a postal code into Lat/Long:

Or translates a Lat/Long into postal code, using the Reverse GeoCoding:

I know this isn’t a pretty tool yet, but I’m still working on it. Until now, it was only a trial to see if the Google API returns the data I need. This tool is also still based on the 2.0 API, which is deprecated since March 8, 2010. It will still work until March 8, 2013, but I need to rebuild it soon.

The next step of course, is to expand the tool to allow the user to translate a whole list of postal codes, all at once. You don’t want to copy-paste the millions of postal codes by hand into the tool, and copy-paste the result back into an Excel workbook or SQL Server table.

But since I’m not a .NET developer, this could take a while. If you’re a developer, and interested in expanding and developing the postal code finder, contact me. Then we can make it a publicly available tool.

Conclusion
Looking at all of this, you’ll see that it’s possible to enrich your data for free, and without a lot of work. Okay, you still need to figure out if your postal code information is correct. Or let the API respond back with the message “G_GEO_UNKNOWN_ADDRESS”, which tells you that the address might be incorrect. Another limit is the number of API calls: 15.000 calls per day. But then again, it’s free, and you can’t have all…

Geometry vs Geography

In the last few months I often received the question: “What datatype do I need to use? Geometry or Geography?”. The answer I gave them, is the most used answer in the IT world: “It depends…”. Because the answer isn’t as easy as you might think…

 
Storage
The storage of both Geometry and Geography is the same. Even though they’re functionality is different, the system datatypes are the same:

SELECT *
FROM sys.types
WHERE system_type_id = 240

The content you will find in a Geometry or Geography column is binary. But by using a number of different methods, you can easily convert this to readable text.

 
Performance
Looking at performance, it could be an easy choice: Geometry will win this. The basic polygons aren’t that different. The extended methods on the other hand, do make a big difference! This is a whole different world, so I’ll write an article about this in the upcoming weeks.

Bob Beauchemin (Blog | @bobbeauch) also gave a session about this, and he had some great examples to illustrate this. You might want to watch his Spatial Data sessions!

 
Functionality
If you’re looking for the biggest difference between the two datatypes, you’ll see it’s the functionality. A Geometry object is just a 2D, flat polygon. This means, it doesn’t matter if you take a country on top of the earth (for example Canada, which is “curved”), or a country close to the Equator (for example Brazil, which is “flat”).

A Geography object on the other hand, is a 3D (or even 4D) polygon, that has the same curve as the shape of the earth. That means that the difference between 2 point isn’t calculated in a straight line, but it’s need to consider the curvature of the earth.

A great example of this concept, is shown by Morten Nielsen (Blog | @dotMorten) on his blog. He made a tool (Shape2Sql, downloadable from his his blog), that can be used to import shape files into SQL Server.

He explains the difference between Geometry and Geography with 2 pictures, where he draws a straight line between Europe and the USA:


Geometry


Geography

As you can see, the straight line is actually a straight line if you use Geometry (planar system). But if you use Geography (spherical system), the line isn’t straight but curved. This could become an issue if you want to know the distance between Berlin and Los Angeles, and you use Geometry data to calculate this. If you use that to calculate the amount of fuel for your plane, you might end up swimming the last part of your journey!

Popping The big question
The big question you need to ask yourself is: “want do I want to do with this?”. If you want to use it just for visualization purposed, you might want to stick with Geometry data. It’s accurate, fast, and doesn’t require difficult calculations. But if you need to calculate distances across the globe, or data that represents the earth, then Geography is the way to go.

Calculating Running Totals

When you work with data and database systems, eventually you need to calculate running totals on (for example) product sales or financial data. There are several methods to calculate these amounts. In this post I’ll try to show the pros and cons to the different solutions.

Let’s start with creating the resources for the examples. The most basic example I could think of, is one with only the necessary information: Date and Turnover.

CREATE TABLE Dough
	(Date DATE,
	 Turnover FLOAT)

And then insert some data:

INSERT INTO Dough
	(Date, Turnover)
VALUES
	('2011-01-01', 1000),
	('2011-02-01', 1250),
	('2011-03-01', 1500),
	('2011-04-01', 1750),
	('2011-05-01', 2000),
	('2011-06-01', 2250),
	('2011-07-01', 2250),
	('2011-08-01', 2000),
	('2011-09-01', 1750),
	('2011-10-01', 1500),
	('2011-11-01', 1250),
	('2011-12-01', 1000)

INSERT INTO Dough
	(Date, Turnover)
VALUES
	('2012-01-01', 100),
	('2012-02-01', 125),
	('2012-03-01', 150),
	('2012-04-01', 175),
	('2012-05-01', 200),
	('2012-06-01', 225),
	('2012-07-01', 225),
	('2012-08-01', 200),
	('2012-09-01', 175),
	('2012-10-01', 150),
	('2012-11-01', 125),
	('2012-12-01', 100)

With this resource, we can start on the examples.

Different solutions


When looking at this question, you’ll notice that there are more solutions to return the correct result. The following queries return the same result, but all the solutions are written for a specific version of SQL Server.

SQL 2000
If you’re using SQL Server 2000 (and I certainly hope you don’t have to anymore ;) ), you can use the query with the INNER JOIN. This can be used on all SQL Server versions:

SELECT
	A.Date,
	A.Turnover,
	SUM(B.Turnover)		AS RunningTotal
FROM Dough A
INNER JOIN Dough B
	ON YEAR(B.Date) = YEAR(A.Date)
	AND B.Date <= A.Date
GROUP BY A.Date, A.Turnover
ORDER BY A.Date ASC

SQL 2005
In SQL Server 2005 they entered a new join type, called CROSS JOIN:

SELECT
	A.Date,
	A.Turnover,
	SUM(B.Turnover)		AS RunningTotal
FROM Dough A
CROSS JOIN Dough B
WHERE YEAR(B.Date) = YEAR(A.Date)
AND B.Date <= A.Date
GROUP BY A.Date, A.Turnover
ORDER BY A.Date ASC

The example with the INNER JOIN and the CROSS JOIN generate the same execution plan.

SQL 2012
With the release of SQL Server 2012 they handed us (SQL developers) a whole new “bag of tricks”. One of these “tricks” is the window function.

The first time I saw the window function, was at a Techdays NL 2012 session. This session was hosted by Bob Beauchemin (Blog | @bobbeauch). The sessions (T-SQL improvements in SQL Server 2012) is worth watching. Even if you’re using SQL Server 2012 already!

With the window function you can compute and group data, and this is done with the rows you specify.

SELECT
	Date,
	TurnOver,
	SUM(TurnOver) OVER (PARTITION BY YEAR(Date)
						ORDER BY Date ASC
						ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)		AS RunningTotals
FROM Dough

Performance


Seeing all these different solutions for the same question, I (and you probably will too) wonder about the performance of these queries. One very quick conclusion: they all return the same records ;) .

When using SET STATISTICS IO, you can see the amount of disk activity generated by your statement. If you run this for the queries above, you will get the following results:

INNER JOIN:

Table ‘Dough’. Scan count 2, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

CROSS JOIN:

Table ‘Dough’. Scan count 2, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

OVER:

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Dough’. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In the OVER query, you see a table called “Worktable”. This is an “extra object” that is generated by SQL Server because you use the OVER statement.

Conclusion


As shown above, there are several different ways to get to the same result. In this example I didn’t show you the cursor solution. This because it’s a bad practice, a bad performer, and a little bit to preserve my good name ;) . If you do want to see this, please leave me a comment, and I’ll add it to this post.

But with every solution you’ll see as much discussion about reasons to use is, as discussions on why NOT to use it. And in this case, you might be bound to a specific SQL Server version, so you can’t use a specific approach.

But if you ask me for my opinion, I’ll go for the last option. Not only because I’ve got the privilege to work with SQL Server 2012 in my daily work, but also because it’s the best performer and you’ll end up with the most readable code.

I’m guessing you have a totally different opinion, so please leave a comment with your ideas and/or approaches to this challenge! Also, comments and questions are also greatly appreciated!

T-SQL Tuesday #35 – Soylent Green

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Soylent Green”. If you want to read the opening post, please click the image below to go to the party-starter: Nick Haslam (Blog | @nhaslam).



The question of this month was to write down our most horrifying discovery from our work with SQL Server. If you work with SQL long enough, you will encounter some strange situations. One way or the other…

My first experience with SQL Server was at my first official IT job. Back then I worked for an insurance company, and there I was offered the opportunity to become a “Conversion Specialist”. This meant that I visited customers, advised them about their data and our software, and converted and import their data into our main system. When I started the job, I’d never wrote a single T-SQL statement. So the learning curve was pretty steep, but after some sleepless nights of studying, I got the hang of it. And during this job, I encountered my first (of many) horrifying experiences…

In this company, the main application took it’s data from one (!!!) SQL 2000 database. The system contained all the data the company had, and it was a “rebuild” DOS application in Delphi (at that time the company worked with Delphi and .NET). In order to store all of their data into one data model (yeah, I know!), they created a “flexible and dynamic” model… In one table…

The data was stored in one table (it wants to remain anonymous, so from now on we’ll call him “Foo”), and Foo contained more then 200 columns (as I recall correctly). Every time we inserted a record into Foo, SQL Server calmly mentioned that the limit of 8000 bytes (max. per record) was exceeded. How they fixed that? I still haven’t got a clue…

Every “object” stored in Foo contained properties, or a collection of properties (which obviously ended up in 1 record for each item in the collection). But as I mentioned, they were stored “dynamically”. So if you wanted to retrieve an object “Tree”, then you needed columns 15, 18, 20 and 52. When retrieving an object “Bird”, you needed columns 15, 18, 25 and 2550 for the same properties.

But I must honestly admit: I left the company after six years with tears in my eyes. They offered me a lot of opportunities, and the colleagues were awesome!

Another example I encountered on a production environment (at a different company), was an issue with currency calculations. The product data and the currency rates were loaded from 2 different data sources. To combine these in one record (and calculate the turnover), they used a CASE statement in the script, that ran in the Data Warehouse. But when I took over the Data Warehouse, they forgot to mention one thing…

If a product was sold for 100 Mexican pesos (with current exchange rates this is about € 6.00 or $ 7.80), and no exchange rate from pesos to dollar was present, the script ended up in the ELSE clause. This clause multiplied the amount with 1, “not to mess up the data”. And without anyone noticing, 100 Mexican pesos turned into $ 100! It actually took a while for people to notice this (including me!).

And I’m probably not the only one with these experiences, so I’m glad Nick asked us to share them with the rest of the #SQLFamily! And if you want to read more of the horrifying experiences, let me know. I might write another blog post about this, because this feels like therapy! And it’s another change for me to make Rob Volk (a great guy, and great example to me!) proud, by using the “Evil” tag again! ;)

Sample Databases – Ye Olde Way!

Last week I was working on a SQL Server presentation, to explain the basic of databases and how SQL Server works to a few colleagues. At the end of my presentation, I wanted to show some demo queries. Normally I would create my own tables with sample data, but I want to give them the opportunity to repeat the demos again on their own.

Nowadays Microsoft offers you the AdventureWorks database as extra download for all new versions of SQL Server. But for some examples I just want a smaller database. In “Ye Olde Days” I worked with Pubs and Northwind. Those were small databases, that were still understandable for starters. My first encounter with SQL Server was on the pubs database, and it still sticks to me as “fun and easy”.

But if you try to find them, you need to download an MSI file that extracts the files to your local system. It contains the .MDF and .LDF file of both the Pubs and Northwind databases, and a ReadMe file. But if you try to attach these databases to a SQL 2012 instance, you’ll get an error. SQL 2000 databases can’t be automatically converted to be SQL 2012 compatible.

I’m glad that they decided to add the create script to the .MSI installer. There’s only 1 thing that doesn’t work if you run the scripts. Both scripts contain a call to sp_dboption. This is a way to change database options in SQL 2000-2008. This is removed in SQL 2012, and MSDN advises you to remove this functionality as soon as possible if you still use it in old systems. So after deleting these from the script, it works perfect. One thing I added for my own use, is after the databases are created, I set them to Read-Only. You can delete this from the script, or undo this after the generation of the database(s).

I’ve also included the ERD (Entity-Relationship Diagram) for both databases. This makes it a little bit easier to start using these databases. I found the diagrams by searching in Google for the name. In this case DataMasker hosted the files I wanted.

The reason to share these scripts is because I’m probably not the only one that still wants to use these databases occasionally. So you can download them by clicking the links below. If you want a backup or the .MDF and .LDF files of the databases, please contact me and we’ll work something out.

Pubs:
PubsCreation Script (.sql)
Pubs ERD (.pdf)

Northwind:
Northwind Creation Script (.sql)
Northwind ERD (.pdf)

Follow

Get every new post delivered to your Inbox.