Write readable and high-performance queries with Window Functions

In SQL Server 2008, we gained a new and powerful feature in our bag of T-SQL tricks: Window Functions. The actual Window Function is the OVER clause, that allows you to determine partitions or “groups” in your query, before applying another function. In practice, this means you can create groups in your dataset, that can be molded further by applying functions to that groups.

In the past, I’ve written a number of posts about Window Functions in SQL Server:

Row_Number: Unique ID in select statement
Calculating Running Totals
Removing duplicates in a dataset

But there’s more!

 
Finding missing numbers in sequence
How many times did you look at a table, and you noticed that one of the values in a sequence is missing? Or a date range in a table, where someone deleted a few records. So all of a sudden, your year doesn’t have 365 days (or 366 in a leap year, or 365.242199 days which is the years average), but 315 days. There go your management reports that are calculated on a per day average. So how do you find the missing dates, without having to write a cursor or create a full blown date table like your BI colleagues do? You guessed right: a query with Window Function!

Let’s start with declaring a table, and insert a sequence with missing values:

DECLARE @Sequence TABLE
  (Number INT)


INSERT INTO @Sequence
  (Number)
VALUES
  (1),
  (10),
  (7),
  (4),
  (2),
  (8),
  (5)

 
So how can we get the “nearest numbers” from that table with just a single select statement?

SELECT
  LAG(Number, 1, 0) OVER (ORDER BY Number) AS LAG_Value,
  Number,
  LEAD(Number, 1, 0) OVER (ORDER BY Number) AS LEAD_Value
FROM @Sequence AS S
ORDER BY Number ASC

 
The LAG and LEAD are standard t-sql functions from sql server 2012 on. These functions give you the opportunity to access the previous or next row, without the need for a so called “self-join”. So what you see is the number, the record preceding that value (LAG) and the following value. So in this case, number 2 is preceded by Number 1, and followed by Number 4.

 
The lemonade stand
Now let’s look at another example. How about you? When you grew up, you wanted to save money for a new mobile phone right? In my case it was either a disc man, a Walkman, or a stereo set. But let’s stick with the modern equivalent of the Walkman for now: the MP3 player. So to earn money for the MP3 player, our fictitious friend who is called Joe, decides to start a lemonade stand. He needs to save up at least $150 to buy a new MP3 player. So every glass of lemonade he sells is accounted for, and at the end of the day he sums up all his
earnings, and puts it into a table:

DECLARE @Profit TABLE
  (DayNumber INT,
   Sales DECIMAL(10,2))


INSERT INTO @Profit
  (DayNumber, Sales)
VALUES
  (1,  6.90),
  (2,  4.17),
  (3,  2.69),
  (4,  7.26),
  (5,  2.93),
  (6,  8.98),
  (7,  7.25),
  (8,  5.88),
  (9,  1.51),
  (10, 7.97),
  (11, 3.44),
  (12, 3.76),
  (13, 9.96),
  (14, 0.92),
  (15, 8.28),
  (16, 6.05),
  (17, 9.40),
  (18, 4.03),
  (19, 9.14),
  (20, 7.25),
  (21, 0.06),
  (22, 9.12),
  (23, 7.39),
  (24, 6.57),
  (25, 4.54),
  (26, 0.09),
  (27, 4.42),
  (28, 9.53),
  (29, 5.09),
  (30, 0.89)

 
So as you can see, he earns quite a lot of money this way! But because he’s eager to buy his new MP3 player, he wants to see his day totals, and the amount he needs to buy his new toy. But because Joe is a really smart guy, he doesn’t want to do this with a lot of self-joins, and he wants his results fast. So looking at performance, what is the easiest way to query this data? How about this:

DECLARE @Goal DECIMAL(10,2) = 150.00


SELECT
  DayNumber,
  Sales,
  @Goal - SUM(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
He declares a “goal” variable, that is set to the amount he needs for his new gadget. So for every row, we calculate the sum of that day, plus all the sales he made from the first day (UNBOUNDED PRECEDING) and today (CURRENT ROW). After day 28 he has earned enough to buy his MP3 player. But now he wants to know what his average sales were. So he calculates the average of his sales, based on every sale he’s made so far:

SELECT
  DayNumber,
  Sales,
  AVG(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
So where does it stop?
So now that we’ve seen the SUM and AVERAGE option, what do we have left? How far can we take this? Thinking about it, how about a daily checkup if we hit a lowest or highest Sales amount? We can do this with the MIN and MAX option on the same query:

SELECT
  DayNumber,
  Sales,
  MIN(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS LowestSale
FROM @Profit


SELECT
  DayNumber,
  Sales,
  MAX(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS HighestSale
FROM @Profit

 
Now Joe can save his daily sales in the evening, and check if he had a very good, or a very bad day.

 
Not only for the average Joe…
So how can the business profit from all this? In some cases, Window Functions make it easier to output a certain resultset. In some cases it even gives you a whole new way to output data with a well performing (single) query, that was impossible until now. So if you’re running SQL Server 2008 or higher, start using (or at least start exploring) Window Functions right away!

 

To generate the random floats, I’ve used the generator of FYIcenter.com

If you want to read more about this topic, don’t forget to check out these blog posts:

- Julie Koesmarno: ABC Classification With SQL Server Window Function
- Mickey Stuewe: A Date At The End of The Month
- Chris Yates: Windows functions who knew

Data paging using offset

With every new release of SQL Server, we get to use new features that make our lives as developers and administrators so much easier. A few days ago, I came across an old piece of code (SQL 2005 if I remember correctly), that I used to page data for a CMS I build when I was still a web developer.

The company I worked for needed a new website, and wanted an HTML-editor to edit content on the website. This content was stored in SQL Server, and was retrieved by the website. With a few business rules we decided which content was visible, and which content was hidden from the website.

One of the features of the website was a news feed. But because there were so much news articles, we needed to show the top 10 articles on the first page and let the user click through to the next page of articles. But because we had so much news in the database, we needed to page this data. Every page should show 10 new articles, and we needed to switch pages for the rest of the news articles.

 
Creating the sample data
In order to show you the problem and solution, we need to create a sample table:

CREATE TABLE dbo.Paging
  (ID INT IDENTITY(1,1),
   Title VARCHAR(50),
   Content VARCHAR(50))

 
The test data we need to insert looks the same for every record:

INSERT INTO dbo.Paging
  (Title, Content)
VALUES
  ('This is an article', 'This is the content')
GO 50

 
This script will insert 50 record in the Paging table.

 
The old way
In older versions of SQL Server you needed to build your own solution to solve this problem. Let’s assume you clicked the news feed button on the website, and we want to switch to page 2 of the results. The solution I build back then looked something like this:

DECLARE @RowsToShow INT = 10,
        @RowsToSkip INT = 10


SELECT TOP(@RowsToShow)
  ID ,
  Title ,
  Content
FROM dbo.Paging
WHERE ID NOT IN
  (SELECT TOP(@RowsToSkip) ID FROM dbo.Paging)
ORDER BY ID ASC

 
In the query above, you can see we skip the first 10 rows, and retrieve the next 10 rows after that. That means that you need to remember on the website which records were retrieved already, and which records you want to see. The easiest way to do this, is by selecting the ID’s you’ve already shown, and retrieve the next set of rows.

This means you get execution plans like this:

 
The new way
From SQL Server 2012 onwards, we can use a new feature that is called OFFSET. That feature allows us to “window” our dataset, and retrieve a small subset of data without using a TOP and subquery, like in the example above. The new query would look like this:

SELECT
  ID ,
  Title ,
  Content
FROM dbo.Paging
ORDER BY ID ASC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY

 
In the query above, you can see an offset of 10 rows, and a fetch of 10 rows. This means that it skips the first 10 records, and retrieves the next 10 records after that. But how can you get this to work with dynamic resultsets and pages? This is one way to do it:

DECLARE @RowsPerPage INT = 10,
        @PageNumber INT = 2

SELECT
  ID ,
  Title ,
  Content
FROM dbo.Paging
ORDER BY ID ASC
OFFSET ((@PageNumber - 1) * @RowsPerPage) ROWS
FETCH NEXT @RowsPerPage ROWS ONLY

 
The offset is calculated by taking the @PageNumber parameter minus one to retrieve the page we want to see. If we wouldn’t do that, the offset would always skip the first 10 records. Then we multiply that number by the @RowsPerPage parameter, to calculate how many results we want to retrieve.

After that, we use the @RowsPerPage in the FETCH NEXT clause to retrieve the number of rows we want (in this case the next 10). This results in a completely different execution plan:

 
As you can see, this has a lot less impact on SQL Server. This becomes really visible if you compare both in SSMS:

 
I/O Costs
Comparing query costs is nice when you quickly compare 2 different approaches for the same solution, but in the end it all comes down to I/O costs. So which query is the fastest solution, and how are we going to test that?

First of all, we need to measure the I/O calls. We do that by using a DBCC command called DROPCLEANBUFFERS. This allows us “to test queries with a cold buffer cache without shutting down and restarting the server”. SO DON’T USE THIS IN PRODUCTION!!!

So the complete testscript looks like this:

SET NOCOUNT ON
SET STATISTICS IO ON


DBCC DROPCLEANBUFFERS

--==================================================
DECLARE @RowsToShow INT = 10,
        @RowsToSkip INT = 10


SELECT TOP(@RowsToShow)
  ID ,
  Title ,
  Content
FROM dbo.Paging
WHERE ID NOT IN
  (SELECT TOP(@RowsToSkip) ID FROM dbo.Paging)
ORDER BY ID ASC
--==================================================

DBCC DROPCLEANBUFFERS

--==================================================
DECLARE @RowsPerPage INT = 10,
        @PageNumber INT = 2

SELECT
  ID ,
  Title ,
  Content
FROM dbo.Paging
ORDER BY ID ASC
OFFSET ((@PageNumber - 1) * @RowsPerPage) ROWS
FETCH NEXT @RowsPerPage ROWS ONLY
--==================================================

So we clean the SQL Server buffers, run the first query, clean the buffers again, and run the second query. Now the effect of the new statement is really obvious, if you look at I/O costs:

 
So the old version of the query (with the sub-select) scans the table twice, and reads 51 pages from the cache. The new approach (with the OFFSET) scans the table only once, and reads only 1 page from the cache.

 
Conclusion
The less I/O calls SQL Server needs to retrieve the result from disk or cache, the faster your query will run. In this case, we’ve tuned the query from 51 pages read to 1 page read. And we’ve only tested this on a table with 50 records, but the network traffic is decreased significantly. So here’s a lot of performance improvement from only one new piece of functionality. And there is a lot more out there.

SQL Server Agent Jobs Schedules – A hidden evil…

One of the many beautiful features of SQL Server, is the possibility to schedule a query or job within a few minutes by using SQL Server Agent Jobs. Especially for people who want to run a query at a quiet moment of the day, but don’t want to stay up, or wake up at 4:00 AM in the morning.

 
Types of schedules
If you dig into the SQL Server Agent Jobs, you can create a job-specific schedule, or create a shared schedule. Both are essentially the same type of schedule, except the fact that the shared schedule can be a part of a lot of jobs.

 
What type do you need?
If you create a new SQL Server Agent Job, you need to figure out which type of schedule you want to use. If you want to start a number of jobs all at the same time, you want to go for a shared schedule. But if you easily want to move jobs in time (change the start date or start time), then a single (job specific) schedule is what you want.

 
How it works
Once you’ve created a job, you can click on “Schedules”, and then either choose “New” (single schedule), or “Pick” (shared schedule):

 
New schedule
If you decide to create a new schedule, you’ll get a window like this:

In this window you can set the runtime of the schedule, start- and end date, frequency, etc.

 
Pick a schedule
If you go for the “Pick” option, you’ll see a window like this:

In this window, you can select a previously made schedule, so you could run it in specifically created time windows for example.

 
Edit schedule
But what’s the catch? If you change a schedule, this might effect other jobs in the same schedule! This is where my issue comes in. A few weeks back a schedule was changed (we wanted to run the job 2 hours earlier every day), and suddenly several other jobs changed schedule. Without checking, the schedule was changed, and that resulted in a really strange situation with parallel running jobs that almost locked out all other processes.

So if you want to change the schedule of a job, click the “Jobs in schedule” button. Double check if your schedule isn’t turned into a shared schedule by you, or one of your colleagues or customers:

 
Manage schedules on server
But that’s not all. Last week I’ve read an article about shared job schedules, that was tweeted by Pieter Vanhove (Blog | @Pieter_Vanhove). In that article I discovered that you can also right-click on “Jobs”, and click on “Manage Schedules”:

This results in this window, where you can see the shared schedule overview, the amount of jobs per schedule, etc:

 
Conclusion
As I’ve said in previous posts: Always check, double check, and triple check before you make any changes before you run into more issues than you solve with your change!

Error deleting checkpoint file

One of the many tasks of being a DBA is to monitor the SQL Server. This doesn’t mean you only keep an eye on daily performance, but you need to monitor the whole process. This means you need to monitor jobs, ad-hoc queries, maintenance plans, etc. But what if you come across an error, and nothing fails…?

 
Finding the error
A few weeks ago I worked on a software release on my SQL Server environment, and the BI developers needed an extra copy of a database for their tests. We didn’t want to test the new functionality on the production database, but the test needed production data. So I decided to restore a backup of the database under a different name.

After a restore of the database, the developers started working on the database. After a few hours, I encountered an error myself (not related to the backup restore), and decided to check the SQL Server error log. But what I saw there was not the error I was looking for:

 

Error: 3634, Severity: 16, State: 2.
The operating system returned the error ’3(failed to retrieve text for this error. Reason: 15105)’ while attempting ‘DeleteFile’ on ‘E:\Backup\RestoreCheckpointDB70.CKP’.

The actual error message can be found, by running a command prompt, and starting “net helpmsg 3″ (where 3 is the error number from the message above). The actual error is:

 

The system cannot find the path specified.

This error occurred at the same time my restore finished. Strange… And looking at the location in the error, there was no folder called “Backup”. But then I remembered something…

The week before something went wrong with the backup process. We currently use the default maintenance plans in SQL Server, and something went wrong with the job. Because of a lack of diskspace on the machine, I cleaned up the drive were the backups were dropped. But strangely enough the drive contained 3 backup folders, of which 2 were completely empty. So without properly checking, I deleted 2 of the 3 folders. The only folder left, was the one that was referenced in the maintenance job.

 
Finding the source of the issue
But now, how to solve this issue. Apparently my cleanup didn’t help SQL Server, but it actually harmed it…

During my search, I found several helpful websites, that led me to this registry key:

 

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.[Instance Name]\MSSQLServer

In this registry key, you’ll find the default backup location that SQL Server uses to write (for example) your checkpoint file to. And guess what… You’re right! This was one of the empty folders I deleted.

 
Resolving the issue
Once I knew what caused the issue, I created a new folder with the same name on the location, and added the security permissions needed. But what if you don’t want to do that, or restoring a directory isn’t possible?

There are several other ways to solve an issue like this. For example (for you gutsy ones!) you could edit the registry. No, seriously, please don’t! :)

One of the ways to change this, is by right-clicking your instance name in SSMSS, open the server properties, and chose “Database Settings”:

Another way to change the default locations, is to right-click your instance name in SSMS, and chose “Facets”:

Conclusion
One of the things I’ve learned in the past, and and that I was reminded of is:

When you’re dealing with SQL Server, always check, double check, and triple check the changes you make!

So the same goes for the setting I’ve showed you above! Always check what you change, because some changes will be affected after the next service restart, or machine reboot. So if you change something, and you don’t see the effects immediately, that doesn’t mean it was a safe change!

Huge operator costs in execution plan

If you work with SQL Server, you’ll need to look at execution plans sooner or later. Now and in the past, I’ve had the privilege of introducing many of my (former) colleagues to these “works of magic”. But what happens if you can’t trust the plans you’re looking at…?

Say what…?
Last week I was asked to look at a slow running query. The first thing I did was look at the execution plan. It hit me pretty fast that this might be a “less optimized” query:

As you can see, it was a query with a lot of CTE’s and sub-selects, that was build by multiple developers and analysts. No one took the time to review or rewrite parts of the query, but they all build their additions on top of the old version. This isn’t uncommon in most companies, because time is precious and costs a company money. And people often find it difficult to ask or make time for quality control.

But looking a little bit closer, I started noticing that the operators in the execution plan were a little bit too high if you ask me:

This couldn’t be correct! So I asked the help of a life saver, called SQL Sentry Plan Explorer. If you don’t already have it, and are using it, start doing that now! And no, they don’t pay me to say this (but if they want to, I have nothing against that…). The main reason I use Plan Explorer, is that it shows you a little bit more information, and the layout is better then the default execution plans from SQL Server. But what does Plan Explorer show us, if we load the same plan?

It seems that the Plan Explorer shows the right numbers. But how is this possible? After some online searching, I came to the conclusion that I’m not the only one having this issue:

Huge operator cost in estimated execution plan
Query plan iterator cost percentage way off
SSMS execution plan sometimes exceeds 100
Katmai also 2005 graphical plan operator costs exceed 100

But unfortunately, all of these issues are marked for “future release”, and only 1 is from last year. The other connect items are much older. So maybe they will fix it for the next release that is just announced.

But keep in mind, even though the numbers look weird, it doesn’t affect performance.

Moving system database: rocket science or piece of cake?

Being a DBA often makes you the “Default Blame Acceptor”, according to Buck Woody (Website | @buckwoody). This means that everything is your fault by default. Server broke down? Your fault! Database corrupt? Your fault! Query of a user doesn’t compile because of a syntax error? Yeah, you guessed right… Your fault!

But on the other hand, you have a lot of opportunities to find out the best practices of doing things. An example of that is moving a system database. About two weeks ago we decided to order 4 SSD’s for our SQL Server. We plan to store tempdb and the SSAS data on these disks, hoping that it will reduce resource costs on our environment.

So with no experience of moving system databases, I started thinking about how to do this. You probably need to stop the SQL Server, move the MDF and LDF files, change the start-up options of SQL Server, start the service, hope that SQL Server finds the new location, etc. But after a quick peek I found a much simpler solution: just modify the current file location!

 
Check the current location and file sizes
Before moving your database (in this case I’m moving my tempdb), run the query below, and store the result just in case all goes south:

SELECT name, physical_name, state_desc, (size * 8 / 1024.00) AS InitialSize
FROM sys.master_files
WHERE database_id = DB_ID('tempdb');

 
The reason you also want to store the initial sizes, is that if you restart the SQL Service (one of the next steps), SQL Server will set the files to the default file sizes. And you don’t want to run on those default settings of course!

 
Set the new file location
You can set the new file location for your tempdb, by running the query below. In this example I’m moving my datafiles to the D:\ volume of my machine:

USE master
GO

ALTER DATABASE tempdb 
	MODIFY FILE (NAME = tempdev, FILENAME = 'D:\MSSQL\DATA\tempdb.mdf')
GO
ALTER DATABASE tempdb 
	MODIFY FILE (NAME = templog, FILENAME = 'D:\LOG\templog.ldf')
GO

 
After executing this statement, you’ll see a message like this appear in the Messages window:

 

The file “tempdev” has been modified in the system catalog. The new path will be used the next time the database is started.
The file “templog” has been modified in the system catalog. The new path will be used the next time the database is started.

 
So the file location is altered, but the running values are not changed until your machine is rebooted, or the SQL Service is restarted.

Now just restart the SQL Service (or the machine if you like to), and run the first query again. This way you can check if your tempdb is stored in the right folder, and if the initial sizes are correct:

SELECT name, physical_name, state_desc, (size * 8 / 1024.00) AS InitialSize
FROM sys.master_files
WHERE database_id = DB_ID('tempdb');

 
The service should stop and start without issues. After the restart you’ll see that SQL Server created a new MDF and LDF file at the new file location. After a successful restart, you can delete the MDF and LDF files from the old location.

 
Now, was that so hard?
So as you can see, not all changes in SQL Server are rocket science. One thing I’ve learned, is that from now on, I’m not going to assume the worst, and hope for the best!

 
UPDATE
As Pieter Vanhove (Blog | @Pieter_Vanhove) mentions in his tweets about msdb/model and master, in some cases you need to do a little bit more work. Because the tempdb is the database with the least probable cause of breaking SQL Server (it’s recreated if the SQL service starts), you can move it by changing the settings.

On the restart, the “Running values” (current settings) are overwritten by the “Configured values” (new settings) you set with the query you ran, and tempdb is recreated.

But the other system databases require a little bit more effort. If you want to move master, model or msdb, check out this link.

And thank you Pieter, for pointing out this stupid mishap to me!

Deadlock Detector: Drop it like it’s hot!

Last time I talked about the right tool for the right job. After that post, I noticed that during the nightly backups and job-runs, the SQL Server log filled up with deadlocks. Hoping to find the cause of this issue, I installed SQL Solutions Deadlock Detector. But I must warn you: it’s easier to install it, then to get rid of it!

 
Downloading it
The drama starts with downloading the tool. If you go to the download location of the tool, there’s no mention of a trial-version. They do like to tell you that it’s free… A lot of times… But guess what… It isn’t!

 
Installing it
After downloading the tool, I started the installation. Again, there’s no mention of any kind that I’ve just downloaded a trial version, and that you need to pay for the tool after 7 days.

 
Use it
After the installation I started the tool. The tool shows you for the first time that you’re actually running a triall:

After I clicked on “Try”, I was prompted to enter a SQL instance to run the tool on. After giving it my SQL Server instance location and name, it showed a pop-up really quick. Apparently it installed a bunch of crap on my SQL instance! Again, there is NO mention that it needs to install objects in your msdb! Also, SQL Server Agent needs to run on your instance, because Deadlock Detector can’t run without that.

So what does Deadlock Detector install? I’ve created a screenshot of all the objects that are installed:

 
Uninstalling it
Okay, the tool wasn’t what I was looking for, so I wanted to uninstall it. But how do I get rid of all those objects that were created in my msdb database?

First I uninstalled the tool, hoping that it would also drop all the objects it created. But at the end of the uninstall, it sent me to this webpage. So I’m suppost to download a special uninstaller to get rid of your installer?

Okay, so I downloaded the special installer-uninstaller, and ran it. Close, but no cigar… The tool ran, returned no errors, but the objects still existed. Running it again, ended with the same results.

To check it yourself, run this query:

SELECT *
FROM msdb.LakeSideLockLogger._LakeSide_DbTools_LockLog

If it’s still running you’ll see record appear that tell you that there were deadlocks on your system. Even though you closed the tool and, like me, expect it to shutdown, the services still run.

To check if the objects still exist on your server, run this query:

SELECT *
FROM msdb.sys.objects
WHERE 1 = 1
AND is_ms_shipped = 0
AND Name LIKE '_LakeSide%'

So what do you do then? Right, I sent the company a support call. I’ve waited more than a week by now, but still no reply. Not even a reply that they’ve received my mail. So a really, really bad service of SQL Solutions if you ask me…

So after a lot of searching, I saw a really small reference to another uninstall method: from the tool itself. So I’ve re-installed the tool, and tried that:

And that worked for me!

 
Conclusion
Apparently more people had issues with uninstalling the tool, looking at results like this.

The thing I learned from this, is to dig a little bit deeper if I look at a new tool. The website of the tool’s manufacturer might not be the best source to find out how a tools functions…

 
UPDATE
In the mean while, I found some more objects that aren’t uninstalled by Deadlock Detector: Operators and Alerts. Even though you’ve “uninstalled” the tool, objects, queues, etc, there are still some leftovers: 1 operator, and 3 alerts.

You can drop them by running the script below:

--Drop Operator
EXEC msdb.dbo.sp_delete_operator @name=N'Deadlock Detector – Default operator'
GO

--Drop Alerts
EXEC msdb.dbo.sp_delete_alert @name=N'Deadlock Detector – Process is killed'
GO

EXEC msdb.dbo.sp_delete_alert @name=N'Deadlock Detector – Lock is detected'
GO

EXEC msdb.dbo.sp_delete_alert @name=N'Deadlock Detector – Deadlock is detected'
GO

My first month as DBA – The right tools for the job

Last month I started my first real DBA job. Until then I only had “accidental DBA” experience, and I’m glad I got the opportunity to prove myself as a real full time DBA.

As a SQL Server developer you and I both know that using the right tools can be a lifesaver. But my first weeks as DBA gave me the feeling this is more important than ever before. Having the right tools can save you a lot of time, and can actually help you make time for the important stuff. In this blog I’ll try to show you which tools I use nowadays, and the reason why you should use them as well.

 
SQL Server Management Studio (SSMS) for SQL Server 2012
If you’re working with SQL Server, you’ll need a version of SSMS (3rd party tools excluded). My first experience was with SQL Server 2000, and back then the “Enterpise Manager” and “Query Analyzer” were a drama to work with. If you look at the last version of the SSMS that is shipped with SQL Server 2012, then you’ll see that SSMS has come a long way!

Because I’m administering SQL Server 2008R2, I can’t use SSMS 2012 for everything, but it’s still my main tool. Just because of the performance enhancements, and the Visual Studio look and feel.

You can download the studio as a separate installation from this location.

 
SSMSBoost
One of my favorite SSMS add-ins is SSMSBoost. This allows you to quickly create and use snippets in your SSMS, script data with a few clicks, and quickly find objects in your databases.

SSMSBoost won the Bronze Community award for “Best Database Development Product” 2012, so I’m not the only one who appreciates this add-in! You can download the tool from their website. After the installation, you can request a free community license on the website.

 
sp_Blitz
If you take over a server as DBA, there’s only one way to determine the health of that server: sp_Blitz! This script, build by Brent Ozar’s company “Brent Ozar Unlimited” (Website | @BrentOzarULTD ), gives you a full overview of the “health status” of your SQL Server.

This also gives you a list of items you might want to fix, in order to ensure a stable and maintainable environment. The items are sorted based on importance, so you know which items you need to fix first. An excellent start for every new environment!

You can download the sp_Blitz code from this location.

 
sp_WhoIsActive
If you start out as a DBA it’s hard to find a good point to start from. What do you want to fix first? Your users keep complaining that they’re queries are running slow, your manager wants more and more performance from the same hardware without any real hardware changes, etc. A good point to start from is finding our which slow running queries and stored procedures your users are executing.

sp_WhoIsActive, written by Adam Machanic (Blog | @AdamMachanic ), gives you the ability to quickly gather this information, without any hassle. Once you’ve deployed the stored procedure to your machine, you can start using it to pinpoint issues on your SQL Server.

You can download the sp_WhoIsActive code from this location.

 
SQLjobvis
The last hurdle I needed to take, is to find out which SQL Server Agent Jobs were running on our environment, and at which time. Because I didn’t want to document this manually, I tried to find a tool that did this for me. Then I came across SQLjobvis.

SQLjobvis, written by SQLsoft (Website), is a free tool that visualizes the jobs on your SQL Server. It shows you all jobs and the result of the execution. You can select the data you want to see by date, and with color codes it shows the result within the date range you set.

You can download SQLjobvis from this location.

 
SQL Sentry Plan Explorer
And last, but not least: SQL Sentry Plan Explorer. I’m glad Pieter Vanhove (Blog | @Pieter_Vanhove) reminded me I forgot an important tool!

SQL Sentry Plan Explorer, written by SQL Sentry Inc. (Website), is a lightweight standalone app that helps you analyse execution plans. By making it more graphical than the default execution plan viewer in SSMS, it’s easier to spot the bottleneck.

You can download the tool from this location. And don’t forget to install the SSMS add-in, so you can directly view your execution plan in the SQL Sentry Plan Explorer from SSMS, when you right-click your execution plan.

 
What tools do you use?
There are many more DBA’s out there, and every DBA has it’s own toolbelt. So I’d like to know which tools do YOU use to get the job done? Let me now by leaving a comment, or contact me by Twitter or mail, and I’ll add it to the list of must-haves!

Geometry vs Geography

In the last few months I often received the question: “What datatype do I need to use? Geometry or Geography?”. The answer I gave them, is the most used answer in the IT world: “It depends…”. Because the answer isn’t as easy as you might think…

 
Storage
The storage of both Geometry and Geography is the same. Even though they’re functionality is different, the system datatypes are the same:

SELECT *
FROM sys.types
WHERE system_type_id = 240

The content you will find in a Geometry or Geography column is binary. But by using a number of different methods, you can easily convert this to readable text.

 
Performance
Looking at performance, it could be an easy choice: Geometry will win this. The basic polygons aren’t that different. The extended methods on the other hand, do make a big difference! This is a whole different world, so I’ll write an article about this in the upcoming weeks.

Bob Beauchemin (Blog | @bobbeauch) also gave a session about this, and he had some great examples to illustrate this. You might want to watch his Spatial Data sessions!

 
Functionality
If you’re looking for the biggest difference between the two datatypes, you’ll see it’s the functionality. A Geometry object is just a 2D, flat polygon. This means, it doesn’t matter if you take a country on top of the earth (for example Canada, which is “curved”), or a country close to the Equator (for example Brazil, which is “flat”).

A Geography object on the other hand, is a 3D (or even 4D) polygon, that has the same curve as the shape of the earth. That means that the difference between 2 point isn’t calculated in a straight line, but it’s need to consider the curvature of the earth.

A great example of this concept, is shown by Morten Nielsen (Blog | @dotMorten) on his blog. He made a tool (Shape2Sql, downloadable from his his blog), that can be used to import shape files into SQL Server.

He explains the difference between Geometry and Geography with 2 pictures, where he draws a straight line between Europe and the USA:


Geometry


Geography

As you can see, the straight line is actually a straight line if you use Geometry (planar system). But if you use Geography (spherical system), the line isn’t straight but curved. This could become an issue if you want to know the distance between Berlin and Los Angeles, and you use Geometry data to calculate this. If you use that to calculate the amount of fuel for your plane, you might end up swimming the last part of your journey!

Popping The big question
The big question you need to ask yourself is: “want do I want to do with this?”. If you want to use it just for visualization purposed, you might want to stick with Geometry data. It’s accurate, fast, and doesn’t require difficult calculations. But if you need to calculate distances across the globe, or data that represents the earth, then Geography is the way to go.

Are nested Inserts possible?

Two weeks ago I got an interesting questions. Two of my colleagues thought of a funny way to tease me, by asking me this question:

If we have an insert into table X, can we nest another insert into the query to fill table Y with the identity generated by insert X?

After thinking about this a few minutes, I told them I thought this was possible. Eventually I found a few spare minutes, and came up with this solution. In all honesty, this isn’t completely my solution. A few forums I came across mentioned this option, and I took those ideas and created my own solution.

So first, let’s create the 2 tables we need for this example:

DECLARE @T1 TABLE
	(ID INT IDENTITY(1,1),
	 String VARCHAR(100))

DECLARE @T2 TABLE
	(ID_REF INT,
	 String VARCHAR(100),
	 Action VARCHAR(100))

So I’ve created T1 (table X in the question), and T2 (table Y). Let’s assume that T1 contains your product data, and for every product you insert, you want to insert a log-record into T2.

This can’t be done with actual nested inserts. If you want to do this, the easiest ways is to create a Stored Procedure (SP) that uses SCOPE_IDENTITY to retrieve the generated ID, and insert that into your second table. But because an SP isn’t always the best ways to do this, I wanted to see if I could transform it into a single query.

Since SQL Server 2008 we have the opportunity to use the MERGE statement. What the MERGE does is synchronize two tables by inserting, updating and deleting rows in the destination table, based on a set of rules and/or joins.

So, how would this statement look like? After some trial and error, I created this statement:

MERGE INTO @T1 T1
USING (SELECT -1 AS ID, 'This is a string' AS String) Q
	ON Q.ID = T1.ID
WHEN NOT MATCHED BY TARGET
	THEN INSERT (String) VALUES ('This is a string...')
OUTPUT
	inserted.ID, inserted.String, CONCAT('<<<<< Inserted in T1 @ ', CONVERT(DATE, GETDATE()) ,' >>>>>')
INTO @T2;

As you can see the destination table is T1. This is the first table the record is inserted into, and the table that generates the identity. Because we only want to insert the record, and not update or delete anything, I only created a “WHEN NOT MATCHED BY TARGET” clause.

Because the ID columns don’t match, the record gets inserted into T1. After that, I use the OUTPUT clause of the merge statement to insert the same record (but with the generated identity) into T2. As a reference, I also insert the action-description that contains a date.

So as you can see, you can use nested inserts in SQL Server, only via another statement. But remember, this is just written to prove it’s possible in a single statement, and not for production usage. So if you decide to use this in production or on your system, consider your other options first!


Code samples:
Are nested Inserts possible.sql

Follow

Get every new post delivered to your Inbox.

Join 34 other followers