T-SQL Tuesday #49 – Wait for it…

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Wait for it…”. If you want to read the opening post, please click the image below to go to the party-starter: Robert Davis (Blog | @SQLSoldier).



 
Explaining developers how SQL Server works is something we all do. Maybe not on a daily basis, but you’re asked questions like “why is my index not working”, or “what’s the best way to add multiple columns to a table”. And most of the time, these questions lead to whole other bunch of questions you need to answer. And the one question we all are asked more than once: “why is my query running slow?”. So where do you start explaining?

 
Wait Types
There are lots and lots of wait types that can be found in SQL Server. In SQL Server 2005 there are 230 different wait types, 475 in SQL Server 2008 and 491 in SQL Server 2008 R2. In SQL Server 2012 they added another 197 new ones to the list. The wait types can be found by running this query:

SELECT wait_type
FROM sys.dm_os_wait_stats
ORDER BY wait_type ASC

 
These wait types can tell you what SQL Server is doing to execute your statement, and what the possible delays are. I’m not going to sum up all the wait types, but here’s a short list of common wait types you’ll see on your SQL server:

 
SOS_SCHEDULER_YIELD
Yielding processor time

LCK_M_*
Waiting for a lock

OLEDB
Wait on the OLEDB provider (Linked servers, Full-Text Search)

WRITELOG
Writing transaction log to disk

RESOURCE_SEMAPHORE
Waiting for a query memory grant

CXPACKET
Query parallelism

PAGEIOLATCH_*
Latch on a memory address while data is retrieved from disk

LAZYWRITER_SLEEP
System process waiting to start

 
All these different wait types could indicate a problem with your statement or the server. Some are more informative, while others show you a real issue. But what I really would like to show you, is how you can find these wait types.

 
DIY or just ask for help…
One of the ways to find the wait types on your SQL server, is to dive into the seemingly endless list of DMV’s. You could use the “sys.dm_exec_requests” and “sys.dm_os_waiting_tasks” DMV’s to find what you want, or you could take the easy way out: sp_WhoIsActive by Adam Machanic (Blog | @AdamMachanic ).

Adam (also the party starter of T-SQL Tuesday) wrote a no less than brilliant script to find problems on your server. But how does it work?

 
Installing sp_WhoIsActive
The “installation” of sp_WhoIsActive couldn’t be easier. You just need to download the script, and run it. This creates a stored procedure in the database of your choice. Usually I just create it in the master database. But if you have a DBA database with useful scripts, it’s okay to create it there.

 
Running it for the first time
The stored procedure can be executed without any parameters. That way, you use the default options. Just run the statement shown below:

EXEC master.dbo.sp_WhoIsActive

 
If you need it, or just like to see more information, you can also configure the procedure with a lot of parameters. If you want to see all the options you can configure, just set the documentation parameter to 1 (true):

EXEC master.dbo.sp_WhoIsActive
  @help = 1

 
The options
If you start using sp_WhoIsActive more and more, you’ll get your own set of favorite options. It all depends on the purpose you’re using the procedure for. Most of the time, I use it to determine why queries run slow, or why the performance of the SQL server is so low.

The information sp_WhoIsActive retrieves gives you a good indication of what SQL Server is doing, or what queries are bugging each other. I’ll list my favourite options below:

First, I set @show_own_spid on, so I can see my own query in the resultset.

The second option I love is @get_plans. This shows you the execution plans of the running queries:

 
Another great parameter to set is @get_outer_command. That way, you won’t just see the query running at the moment, but also the outer-command of the query (in the example below, the INSERT INTO the temp table is executed from within the stored procedure you see in the right column):

 
To see which transaction logs are used when running your query, set @get_transaction_info to 1:

 
Information regarding locks can be found, by setting @get_locks to 1:

 
If you click the XML, you’ll see which locks are granted, pending or denied:

 
The last option I’d like to set, is @get_additional_info. This will show you more information regarding the connection settings, session variables, etc:

 
Clicking the XML shows you the properties I mentioned above:

 
So this is what the query looks like, the way I personally like to use it:

EXEC master.dbo.sp_WhoIsActive
  @show_own_spid = 1,
  @get_plans = 1,
  @get_outer_command = 1,
  @get_transaction_info = 1,
  @get_locks = 1,
  @get_additional_info = 1

 
Conclusion
Wait types are your keys to open the door of the next level of SQL Server. Not all wait types are that easy to read and understand, but there are plenty of resources to be found online. For example, just take a look at the rest of the posts today. Most of the posts for T-SQL Tuesday can be found on Twitter, when you search for #TSQL2sDay.


I want to say thanks to the employees at Coeo for the easy explanation of some of the wait types!

Incremental updates with Change Data Capture

When designing a database or ETL process, for example loading your production data into a reporting environment, you always start your design with performance in mind. In the beginning of the project, your scripts and ETL run blazing fast. But after a few months in production, the entire project grinds to a halt. But how do you fix that problem, without a complete redesign of your applications and database? One of the many solutions is an easy one: incrementally load your data into the destination tables.

 
Change Data Capture
Incremental data loading could be a hard nut to crack. It’s not always an option, but it might be a good point to start from. One of the ways to start loading your data incrementally, is by using the keys in your database as a reference. If your table has a column called “Modified Date”, and that is updated every time the record is updated, you could use that. Every night, when the process runs, you just add the records that were modified after the last successful process run. But what if you don’t have that possibility? Change Data Capture (CDC) is an easy way out.

 
CDC is a way to record inserts, updates and deletes on a specific table, without the need of writing the triggers yourself. CDC reads the transaction log, and captures all changes made to the specific table. These changes are stored in the associated change table, that is created by CDC.

Below I’m going to show you how to setup your first table with CDC.If you would like to know more about CDC, this TechNet article is a place to start.

 
Create an example
To show you the basics of CDC, let start with creating a table called TestCDC in the database called Sandbox:

USE Sandbox
GO

CREATE TABLE dbo.TestCDC
  (ID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
	 Descr varchar(50) NULL)
GO

 
Once you’ve created the table, turn on CDC at the database level, by execution the system stored procedure created to do that:

EXEC sys.sp_cdc_enable_db

 
There is also a system stored procedure to enable CDC on the table level. You need to enable CDC on tables manually, and separately for every table you need:

EXEC sys.sp_cdc_enable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @role_name = NULL

 
If the SQL Server Agent is running on your machine or server, you’ll see this confirmation (I’ll explain later why SQL Server Agent is needed):

 
If the Agent isn’t running, you’ll see this warning:

 
If you ran the enable table statement, you will see that SQL Server created the system objects needed to track changes in the table:

 
Because CDC uses 2 SQL Server Agent jobs to capture and cleanup the change tables, you need to run the Agent to start the data capture. If the jobs aren’t running, SQL Server won’t capture any changes made:

 
Start data changes
In order to see what happens when you change data, let’s insert some records:

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description')

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description too...')

 
And let’s update one of those 2 inserted records:

UPDATE dbo.TestCDC
SET Descr = 'UPD - ' + Descr
WHERE ID = 2

 
Now, let’s check the content of both the original table, and the change table:

/* Original table */
SELECT * FROM dbo.TestCDC

/* Change table */
SELECT * FROM cdc.dbo_TestCDC_CT

 
If you run both queries, you’ll see the resultset below:

 
The records in the CDC change table allow you to update the data in your reporting environment. You could query them yourself, by retrieving all the changes since your last update. You can also use the procedures that return those changes for you, for example the cdc.fn_cdc_get_net_changes_. You can read more about the system function here.

 
Cleaning up after an update
Now that you’ve updated your reporting environment, it’s a wise thing to cleanup your CDC data. You could also drop the records yourself with a DELETE statement. Another option is using the system procedure for that: “sys.sp_cdc_cleanup_change_table”. You can clean your data using the following SQL statement:

DECLARE @Last_LSN VARBINARY(10) =
  (SELECT MAX(cdc.dbo_TestCDC_CT.[__$start_lsn]) FROM cdc.dbo_TestCDC_CT)

EXEC sys.sp_cdc_cleanup_change_table
  @capture_instance = 'dbo_TestCDC',
  @low_water_mark = @Last_LSN,
  @threshold = 5000

 
The query will retrieve the last LSN (Log Sequence Number), and remove everything that happened before that.

 
Cleanup of CDC
If you want to completely remove CDC (because we’re done testing), you can disable it on the table level by running the query below:

EXEC sys.sp_cdc_disable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @capture_instance = 'dbo_TestCDC'

 
The statement will cleanup all the objects that were created to enable CDC on that specific table. But the statement will only stop the CDC on the specific table. The fastest way to disable CDC on all tables in the database, is disabling CDC on the database level, by running the query below:

EXEC sys.sp_cdc_disable_db

 
Conclusion
Loading data always takes time, and there are many factors that are important: your database size, your frequency of changes, your ETL process, etc. The time it costs you to move data can be changed by rewriting your process to incremental loads. CDC is one of the many ways to achieve this. It works out of the box, and doesn’t require you to build any process yourself. But maybe your environment needs a custom process to operate the way you want it to. Not every feature in SQL Server is a so called silver bullet, but sometimes it comes darn close to one…

T-SQL Tuesday #43 – Hello, Operator?

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Hello, Operator?”. If you want to read the opening post, please click the image below to go to the party-starter: Rob Farley (Blog | @rob_farley).



 
Execution Plan
Everybody that ever got serious with SQL Server, knows about execution plans. And like Rob said in his opening post, if you don’t, start looking into that! But if you start working with execution plans, it gets confusing really fast. All those weird “building blocks” that tell you what SQL Server did with your query?! It almost looks like magic…!

But when you dive into it, they become easier to read and they suddenly are somewhat understandable. But hey, even after all those years looking at execution plans, some things still amaze me.

Last week I was asked to look at a query, that was used in an SSIS package. This package was one of the slowest from the nightly ETL processes. So I started of by looking at the execution plan:

 
Start pinpointing the first issue
Then I started digging, and I noticed this:

LEFT JOIN DB.dbo.T1 m
	ON m.Col1 = ISNULL(LOWER(t.Col1) COLLATE Latin1_General_CS_AS, '(N/A)')
LEFT JOIN DB.dbo.T2 s
	ON s.Col2 = ISNULL(LOWER(t.Col2) COLLATE Latin1_General_CS_AS, '(N/A)')
LEFT JOIN DB.dbo.T3 k
	ON k.Col3 = ISNULL(LOWER(t.Col3) COLLATE Latin1_General_CS_AS, '(N/A)')

 
But wait a minute… The collations we use aren’t Case Sensitive, so why use the LOWER() function anyway? And the collations are practically the same for the 2 databases used in the query (“SQL_Latin1_General_CP1_CI_AS” and “Latin1_General_CI_AS”). But the column collations are both the same (“SQL_Latin1_General_CP1_CI_AS”). SQL Server doesn’t throw an error if I join both sets, and the results aren’t different if I use COLLATE or not. So we learned another thing: the column collation is used in the ON clause of the JOIN. So both functions aren’t necessary because the collations are equal, and thus their wasting our precious CPU cycles…

Now here comes the moment you’ve all been waiting for. How does the execution plan differ? This is what the new execution plan looks like:

So what are we missing? You guessed right! We’re missing this bad boy:

So what does the “Compute Scalar” actually do? According to MSDN:

 

The Compute Scalar operator evaluates an expression to produce a computed scalar value

So that means it calculates a new value for every input column. This calculation can be as simple as a conversion of the value, or a concatenation of multiple values. Most of the time these “Compute Scalar” aren’t a major cost in your execution plan, but it can become an issue that uses a lot of CPU.

But what does this “Compute Scalar” do in this query? If I remove the ISNULL() from the query, it’s still there. It disappears if I remove the COLLATE. So in this case, the “Compute Scalar” is “calculating” a new value for every record. This “calculation” actually is a conversion from 1 collation to the other.

 
But what’s the difference?
If we compare both versions of the query, the modified version has a lot less impact on the server:

And this is only without the LOWER() and COLLATE functions. So with a little bit more effort, you can rewrite a part of the query, maybe add an index, etc. So as you can see, small changes can have a huge impact!

 
Conclusion
A “Compute Scalar” isn’t that hard to understand, but you’ll find it very often in your execution plans. So even though it isn’t the worst performance blocker in the world, it’s an interesting one to get familiar with.

If you don’t use SQL Sentry Plan Explorer yet, download your copy now!

Moving system database: rocket science or piece of cake?

Being a DBA often makes you the “Default Blame Acceptor”, according to Buck Woody (Website | @buckwoody). This means that everything is your fault by default. Server broke down? Your fault! Database corrupt? Your fault! Query of a user doesn’t compile because of a syntax error? Yeah, you guessed right… Your fault!

But on the other hand, you have a lot of opportunities to find out the best practices of doing things. An example of that is moving a system database. About two weeks ago we decided to order 4 SSD’s for our SQL Server. We plan to store tempdb and the SSAS data on these disks, hoping that it will reduce resource costs on our environment.

So with no experience of moving system databases, I started thinking about how to do this. You probably need to stop the SQL Server, move the MDF and LDF files, change the start-up options of SQL Server, start the service, hope that SQL Server finds the new location, etc. But after a quick peek I found a much simpler solution: just modify the current file location!

 
Check the current location and file sizes
Before moving your database (in this case I’m moving my tempdb), run the query below, and store the result just in case all goes south:

SELECT name, physical_name, state_desc, (size * 8 / 1024.00) AS InitialSize
FROM sys.master_files
WHERE database_id = DB_ID('tempdb');

 
The reason you also want to store the initial sizes, is that if you restart the SQL Service (one of the next steps), SQL Server will set the files to the default file sizes. And you don’t want to run on those default settings of course!

 
Set the new file location
You can set the new file location for your tempdb, by running the query below. In this example I’m moving my datafiles to the D:\ volume of my machine:

USE master
GO

ALTER DATABASE tempdb 
	MODIFY FILE (NAME = tempdev, FILENAME = 'D:\MSSQL\DATA\tempdb.mdf')
GO
ALTER DATABASE tempdb 
	MODIFY FILE (NAME = templog, FILENAME = 'D:\LOG\templog.ldf')
GO

 
After executing this statement, you’ll see a message like this appear in the Messages window:

 

The file “tempdev” has been modified in the system catalog. The new path will be used the next time the database is started.
The file “templog” has been modified in the system catalog. The new path will be used the next time the database is started.

 
So the file location is altered, but the running values are not changed until your machine is rebooted, or the SQL Service is restarted.

Now just restart the SQL Service (or the machine if you like to), and run the first query again. This way you can check if your tempdb is stored in the right folder, and if the initial sizes are correct:

SELECT name, physical_name, state_desc, (size * 8 / 1024.00) AS InitialSize
FROM sys.master_files
WHERE database_id = DB_ID('tempdb');

 
The service should stop and start without issues. After the restart you’ll see that SQL Server created a new MDF and LDF file at the new file location. After a successful restart, you can delete the MDF and LDF files from the old location.

 
Now, was that so hard?
So as you can see, not all changes in SQL Server are rocket science. One thing I’ve learned, is that from now on, I’m not going to assume the worst, and hope for the best!

 
UPDATE
As Pieter Vanhove (Blog | @Pieter_Vanhove) mentions in his tweets about msdb/model and master, in some cases you need to do a little bit more work. Because the tempdb is the database with the least probable cause of breaking SQL Server (it’s recreated if the SQL service starts), you can move it by changing the settings.

On the restart, the “Running values” (current settings) are overwritten by the “Configured values” (new settings) you set with the query you ran, and tempdb is recreated.

But the other system databases require a little bit more effort. If you want to move master, model or msdb, check out this link.

And thank you Pieter, for pointing out this stupid mishap to me!

Deadlock Detector: Drop it like it’s hot!

Last time I talked about the right tool for the right job. After that post, I noticed that during the nightly backups and job-runs, the SQL Server log filled up with deadlocks. Hoping to find the cause of this issue, I installed SQL Solutions Deadlock Detector. But I must warn you: it’s easier to install it, then to get rid of it!

 
Downloading it
The drama starts with downloading the tool. If you go to the download location of the tool, there’s no mention of a trial-version. They do like to tell you that it’s free… A lot of times… But guess what… It isn’t!

 
Installing it
After downloading the tool, I started the installation. Again, there’s no mention of any kind that I’ve just downloaded a trial version, and that you need to pay for the tool after 7 days.

 
Use it
After the installation I started the tool. The tool shows you for the first time that you’re actually running a triall:

After I clicked on “Try”, I was prompted to enter a SQL instance to run the tool on. After giving it my SQL Server instance location and name, it showed a pop-up really quick. Apparently it installed a bunch of crap on my SQL instance! Again, there is NO mention that it needs to install objects in your msdb! Also, SQL Server Agent needs to run on your instance, because Deadlock Detector can’t run without that.

So what does Deadlock Detector install? I’ve created a screenshot of all the objects that are installed:

 
Uninstalling it
Okay, the tool wasn’t what I was looking for, so I wanted to uninstall it. But how do I get rid of all those objects that were created in my msdb database?

First I uninstalled the tool, hoping that it would also drop all the objects it created. But at the end of the uninstall, it sent me to this webpage. So I’m suppost to download a special uninstaller to get rid of your installer?

Okay, so I downloaded the special installer-uninstaller, and ran it. Close, but no cigar… The tool ran, returned no errors, but the objects still existed. Running it again, ended with the same results.

To check it yourself, run this query:

SELECT *
FROM msdb.LakeSideLockLogger._LakeSide_DbTools_LockLog

If it’s still running you’ll see record appear that tell you that there were deadlocks on your system. Even though you closed the tool and, like me, expect it to shutdown, the services still run.

To check if the objects still exist on your server, run this query:

SELECT *
FROM msdb.sys.objects
WHERE 1 = 1
AND is_ms_shipped = 0
AND Name LIKE '_LakeSide%'

So what do you do then? Right, I sent the company a support call. I’ve waited more than a week by now, but still no reply. Not even a reply that they’ve received my mail. So a really, really bad service of SQL Solutions if you ask me…

So after a lot of searching, I saw a really small reference to another uninstall method: from the tool itself. So I’ve re-installed the tool, and tried that:

And that worked for me!

 
Conclusion
Apparently more people had issues with uninstalling the tool, looking at results like this.

The thing I learned from this, is to dig a little bit deeper if I look at a new tool. The website of the tool’s manufacturer might not be the best source to find out how a tools functions…

 
UPDATE
In the mean while, I found some more objects that aren’t uninstalled by Deadlock Detector: Operators and Alerts. Even though you’ve “uninstalled” the tool, objects, queues, etc, there are still some leftovers: 1 operator, and 3 alerts.

You can drop them by running the script below:

--Drop Operator
EXEC msdb.dbo.sp_delete_operator @name=N'Deadlock Detector – Default operator'
GO

--Drop Alerts
EXEC msdb.dbo.sp_delete_alert @name=N'Deadlock Detector – Process is killed'
GO

EXEC msdb.dbo.sp_delete_alert @name=N'Deadlock Detector – Lock is detected'
GO

EXEC msdb.dbo.sp_delete_alert @name=N'Deadlock Detector – Deadlock is detected'
GO

Enriching your dataset – What are your technical options?

In my previous post I tried to explain how to enrich your dataset, so you can start using spatial data. You’ve seen that there are free options, or you can buy a bunch of files with the information you need.

This time I’ll try to explain the technical option you have. The options all contain the Google Geocoding API, because this is free, and it’s fairly accurate. But this will also work for the files you buy from (for example) GfK GeoMarketing, which I did business with once.

Write your own application
One of the easiest an quickest options for me was to write an application to fetch my data from the Google API. I’m not a die-hard programmer, but I know my way around C# as far as I need to. So with a little help from my colleagues, I managed to write a Windows Forms application that calls the Geocoding API. The results of the API call are either just shown on screen (in a GridView), or exported to Excel (using LinqToExcel and ExcelExporter).

Another option I intend to build in, is the export from and to a SQL Server database. But because this is a project I work on in my own time, this could take a couple of weeks.

CLR
One of the other options I’ve found, is a CLR that calls the API. The CLR you write, is basically a .NET application that you load into SQL Server. It’s more complicated than that, but to keep this story moving on, I’ll leave it at that.

One of the many downsides of using a CLR, is your local DBA. Every DBA I’ve encountered in my life told me: “Don’t use a CLR, EVER!!! Because that’s the biggest security hole you can open up on a SQL Server”. To be completely honest, I’m not that familiar with CLR’s, but I guess it isn’t as easy as that. There might be some pros and cons regarding that…

SSIS Package
In my quest to find all possible options, I found this great article by Donabel Santos (Blog | @sqlbelle). In her article she describes how you can create an SSIS package that fetches a Lat/Long for every record in the dataset you use as input.

Talking about this with Koen Verbeeck (Blog | @Ko_Ver) on Twitter made me realize that for normal businesses, this might be the best option. In every company I know, there’s always a server running SQL Server Integration Services that you can use to execute such a package.

So, what’s your choice?
Looking at the options above, I’m guessing the option you choose depends on the company you work for. I guess that smaller companies would choose the SSIS package over building a tool. But if your company has a development department with a bunch of software developers, writing your own tool might be a better option. In that case writing your own tool gives you more flexibility, because not only your DBA can debug an error in an SSIS package, but every developer can debug the tool they’ve written.

If you’ve got some .NET/C# knowledge, and you have the time to dive into CLR’s, that might be your best option. CLR’s are loaded into SQL Server, and are (as far as I can tell) blazing fast. So if you’re looking for performance, and don’t want external tools or packages to fetch your data, go for it!

Get column headers without retrieving data

A few days ago Pinal Dave (Blog | @pinaldave) retweeted an old blog post about retrieving columns without data. Reading this blog post, I started thinking of other ways to achieve this.

One of the downsides of using SET FMTONLY ON is that it’s a session-setting. This means that every query you run in that session, doesn’t return results. This could end up in very confusing situations. You could end up debugging a query which doesn’t return any result, and eventually you discover that this is a result of SET FMTONLY ON. So this may not be the preferred way to get the column headers from a specific query. Also, this option doesn’t return estimated or actual execution plans!

An example of such a query could be:

SET FMTONLY ON

SELECT *
FROM sys.tables

SET FMTONLY OFF

Another way of getting this information, without changing your connection options, is a simple WHERE clause.

SELECT *
FROM sys.tables
WHERE 1 = 0

This query will return the same results as the query with SET FMTONLY ON. And looking at performance, there’s no real difference between the two. I ran statistics on both solutions, and the I/O generated is pretty neglectable.

Conclusion
My personal favorite is the WHERE clause. This is because it doesn’t involve session modifiers, and (if you want to) still returns an execution plan. The SET FMTONLY ON doesn’t return an estimated or actual execution plan.

On the other hand, SET FMTONLY is on the list of deprecated features in future SQL Server versions. So for all new functionality, I’d advise you not to use it.

But maybe the other options suits you more. As always: it depends. Depending on your own preferences and the specific situation. But if you have any other options, please let me know. Post a comment, send me an e-mail, tweet me, whatever you want :).

Follow

Get every new post delivered to your Inbox.

Join 50 other followers