My first month as DBA – The right tools for the job

Last month I started my first real DBA job. Until then I only had “accidental DBA” experience, and I’m glad I got the opportunity to prove myself as a real full time DBA.

As a SQL Server developer you and I both know that using the right tools can be a lifesaver. But my first weeks as DBA gave me the feeling this is more important than ever before. Having the right tools can save you a lot of time, and can actually help you make time for the important stuff. In this blog I’ll try to show you which tools I use nowadays, and the reason why you should use them as well.

 
SQL Server Management Studio (SSMS) for SQL Server 2012
If you’re working with SQL Server, you’ll need a version of SSMS (3rd party tools excluded). My first experience was with SQL Server 2000, and back then the “Enterpise Manager” and “Query Analyzer” were a drama to work with. If you look at the last version of the SSMS that is shipped with SQL Server 2012, then you’ll see that SSMS has come a long way!

Because I’m administering SQL Server 2008R2, I can’t use SSMS 2012 for everything, but it’s still my main tool. Just because of the performance enhancements, and the Visual Studio look and feel.

You can download the studio as a separate installation from this location.

 
SSMSBoost
One of my favorite SSMS add-ins is SSMSBoost. This allows you to quickly create and use snippets in your SSMS, script data with a few clicks, and quickly find objects in your databases.

SSMSBoost won the Bronze Community award for “Best Database Development Product” 2012, so I’m not the only one who appreciates this add-in! You can download the tool from their website. After the installation, you can request a free community license on the website.

 
sp_Blitz
If you take over a server as DBA, there’s only one way to determine the health of that server: sp_Blitz! This script, build by Brent Ozar’s company “Brent Ozar Unlimited” (Website | @BrentOzarULTD ), gives you a full overview of the “health status” of your SQL Server.

This also gives you a list of items you might want to fix, in order to ensure a stable and maintainable environment. The items are sorted based on importance, so you know which items you need to fix first. An excellent start for every new environment!

You can download the sp_Blitz code from this location.

 
sp_WhoIsActive
If you start out as a DBA it’s hard to find a good point to start from. What do you want to fix first? Your users keep complaining that they’re queries are running slow, your manager wants more and more performance from the same hardware without any real hardware changes, etc. A good point to start from is finding our which slow running queries and stored procedures your users are executing.

sp_WhoIsActive, written by Adam Machanic (Blog | @AdamMachanic ), gives you the ability to quickly gather this information, without any hassle. Once you’ve deployed the stored procedure to your machine, you can start using it to pinpoint issues on your SQL Server.

You can download the sp_WhoIsActive code from this location.

 
SQLjobvis
The last hurdle I needed to take, is to find out which SQL Server Agent Jobs were running on our environment, and at which time. Because I didn’t want to document this manually, I tried to find a tool that did this for me. Then I came across SQLjobvis.

SQLjobvis, written by SQLsoft (Website), is a free tool that visualizes the jobs on your SQL Server. It shows you all jobs and the result of the execution. You can select the data you want to see by date, and with color codes it shows the result within the date range you set.

You can download SQLjobvis from this location.

 
SQL Sentry Plan Explorer
And last, but not least: SQL Sentry Plan Explorer. I’m glad Pieter Vanhove (Blog | @Pieter_Vanhove) reminded me I forgot an important tool!

SQL Sentry Plan Explorer, written by SQL Sentry Inc. (Website), is a lightweight standalone app that helps you analyse execution plans. By making it more graphical than the default execution plan viewer in SSMS, it’s easier to spot the bottleneck.

You can download the tool from this location. And don’t forget to install the SSMS add-in, so you can directly view your execution plan in the SQL Sentry Plan Explorer from SSMS, when you right-click your execution plan.

 
What tools do you use?
There are many more DBA’s out there, and every DBA has it’s own toolbelt. So I’d like to know which tools do YOU use to get the job done? Let me now by leaving a comment, or contact me by Twitter or mail, and I’ll add it to the list of must-haves!

Geometry vs Geography

In the last few months I often received the question: “What datatype do I need to use? Geometry or Geography?”. The answer I gave them, is the most used answer in the IT world: “It depends…”. Because the answer isn’t as easy as you might think…

 
Storage
The storage of both Geometry and Geography is the same. Even though they’re functionality is different, the system datatypes are the same:

SELECT *
FROM sys.types
WHERE system_type_id = 240

The content you will find in a Geometry or Geography column is binary. But by using a number of different methods, you can easily convert this to readable text.

 
Performance
Looking at performance, it could be an easy choice: Geometry will win this. The basic polygons aren’t that different. The extended methods on the other hand, do make a big difference! This is a whole different world, so I’ll write an article about this in the upcoming weeks.

Bob Beauchemin (Blog | @bobbeauch) also gave a session about this, and he had some great examples to illustrate this. You might want to watch his Spatial Data sessions!

 
Functionality
If you’re looking for the biggest difference between the two datatypes, you’ll see it’s the functionality. A Geometry object is just a 2D, flat polygon. This means, it doesn’t matter if you take a country on top of the earth (for example Canada, which is “curved”), or a country close to the Equator (for example Brazil, which is “flat”).

A Geography object on the other hand, is a 3D (or even 4D) polygon, that has the same curve as the shape of the earth. That means that the difference between 2 point isn’t calculated in a straight line, but it’s need to consider the curvature of the earth.

A great example of this concept, is shown by Morten Nielsen (Blog | @dotMorten) on his blog. He made a tool (Shape2Sql, downloadable from his his blog), that can be used to import shape files into SQL Server.

He explains the difference between Geometry and Geography with 2 pictures, where he draws a straight line between Europe and the USA:


Geometry


Geography

As you can see, the straight line is actually a straight line if you use Geometry (planar system). But if you use Geography (spherical system), the line isn’t straight but curved. This could become an issue if you want to know the distance between Berlin and Los Angeles, and you use Geometry data to calculate this. If you use that to calculate the amount of fuel for your plane, you might end up swimming the last part of your journey!

Popping The big question
The big question you need to ask yourself is: “want do I want to do with this?”. If you want to use it just for visualization purposed, you might want to stick with Geometry data. It’s accurate, fast, and doesn’t require difficult calculations. But if you need to calculate distances across the globe, or data that represents the earth, then Geography is the way to go.

Advanced spatial data querying

In this post (part 2 of this series), I’ll try to explain the power of spatial data with real-life polygons.

 
Data
First of all, we need some data. There are several sources of spatial data, and I’ve described importing this data in a previous post. For this post, I’ll be using the Netherlands as an example dataset. You can also use the queries below on your own set of spatial data.

 
Layers
If you start working with spatial data, there are several things to keep in mind. One of these things is that spatial data consists of layers. Every set of objects you select (or draw yourself) is turned into a layer. Layers are stackable and can be joined together (with a UNION operator).

 
First selection
After importing the shape file, we start with a simple selection. In this case the city I love the most in the Netherlands: Rotterdam. This beautiful port-city in the province of South-Holland has a very recognizable shape, and is excellent for the upcoming examples.

Selecting the polygon of Rotterdam can be done with this query:

SELECT GEOG
FROM NLD_adm2
WHERE NAME_2 = 'Rotterdam';

 
Buffer
Every spatial object (Geography and Geometry) has extended methods you can use. These methods vary from very easy (draw a buffer around my object), to calculate the closest distance to another object.

Starting with a very easy one: adding a buffer to a spatial object. You can do this by using the STBuffer() method. The distance you can add as an argument, is based on the reference system of the shape file you imported. In this case we’re working with the EPSG system (SRID 4326), and the unit it used is meter:

SELECT *
FROM sys.spatial_reference_systems
WHERE spatial_reference_id = 4326

So, if you like to draw a buffer of 1 kilometer (1000 meters, or 0.62 miles/ 1091.2 yards for my American friends) around Rotterdam, we use the following query:

SELECT GEOG
FROM NLD_adm2
WHERE NAME_2 = 'Rotterdam'

UNION ALL

SELECT GEOG.STBuffer(1000)
FROM NLD_adm2
WHERE NAME_2 = 'Rotterdam';

As you can see, I selected the city of Rotterdam as a separate layer, with the “buffered layer” on top of it. I did this for visual purposes, so you can actually see what is happening to your object.

 
Get surrounding cities
One of the advantages of spatial data, is that polygons are actually connected to each other. So looking for cities around Rotterdam is pretty easy. What I did is draw a buffer of 1 meter around my city, and ask what intersects my buffer. This query actually works without the buffer, but again, I added it for visual reasons.

DECLARE @City geography;

SET @City = (SELECT GEOG FROM NLD_adm2 WHERE NAME_2 = 'Rotterdam');

SELECT GEOG
FROM NLD_adm2
WHERE GEOG.STIntersects(@City.STBuffer(1)) = 1;

 
Aggregating polygons
Sometimes you want to create your own polygon by aggregating some existing polygons. This can be accomplished by using the method UnionAggregate(). This method aggregates all polygons you select in your query. In this case, all surrounding cities and villages of Rotterdam.

DECLARE @CityBuffer geography;

SET @CityBuffer = (SELECT GEOG FROM NLD_adm2 WHERE NAME_2 = 'Rotterdam');

SELECT Geography::UnionAggregate(GEOG)
FROM NLD_adm2
WHERE GEOG.STIntersects(@CityBuffer.STBuffer(1)) = 1;

 
Conclusion
As you can see in the post above, Spatial Data is a very powerful part of SQL Server. My guess is that it will only become more and more important for businesses.

I’m still looking for a subject for part 3 of this series. I’m thinking about storage, indexing or performance ofw spatial data, but I’d like to hear it from you. What do you want to see in part 3?

Are nested Inserts possible?

Two weeks ago I got an interesting questions. Two of my colleagues thought of a funny way to tease me, by asking me this question:

If we have an insert into table X, can we nest another insert into the query to fill table Y with the identity generated by insert X?

After thinking about this a few minutes, I told them I thought this was possible. Eventually I found a few spare minutes, and came up with this solution. In all honesty, this isn’t completely my solution. A few forums I came across mentioned this option, and I took those ideas and created my own solution.

So first, let’s create the 2 tables we need for this example:

DECLARE @T1 TABLE
	(ID INT IDENTITY(1,1),
	 String VARCHAR(100))

DECLARE @T2 TABLE
	(ID_REF INT,
	 String VARCHAR(100),
	 Action VARCHAR(100))

So I’ve created T1 (table X in the question), and T2 (table Y). Let’s assume that T1 contains your product data, and for every product you insert, you want to insert a log-record into T2.

This can’t be done with actual nested inserts. If you want to do this, the easiest ways is to create a Stored Procedure (SP) that uses SCOPE_IDENTITY to retrieve the generated ID, and insert that into your second table. But because an SP isn’t always the best ways to do this, I wanted to see if I could transform it into a single query.

Since SQL Server 2008 we have the opportunity to use the MERGE statement. What the MERGE does is synchronize two tables by inserting, updating and deleting rows in the destination table, based on a set of rules and/or joins.

So, how would this statement look like? After some trial and error, I created this statement:

MERGE INTO @T1 T1
USING (SELECT -1 AS ID, 'This is a string' AS String) Q
	ON Q.ID = T1.ID
WHEN NOT MATCHED BY TARGET
	THEN INSERT (String) VALUES ('This is a string...')
OUTPUT
	inserted.ID, inserted.String, CONCAT('<<<<< Inserted in T1 @ ', CONVERT(DATE, GETDATE()) ,' >>>>>')
INTO @T2;

As you can see the destination table is T1. This is the first table the record is inserted into, and the table that generates the identity. Because we only want to insert the record, and not update or delete anything, I only created a “WHEN NOT MATCHED BY TARGET” clause.

Because the ID columns don’t match, the record gets inserted into T1. After that, I use the OUTPUT clause of the merge statement to insert the same record (but with the generated identity) into T2. As a reference, I also insert the action-description that contains a date.

So as you can see, you can use nested inserts in SQL Server, only via another statement. But remember, this is just written to prove it’s possible in a single statement, and not for production usage. So if you decide to use this in production or on your system, consider your other options first!


Code samples:
Are nested Inserts possible.sql

Querying spatial data – the basics

A few months ago I started working with spatial data in SQL Server. From that moment on, and I’ve said this before, I’m hooked!

And since my daily work includes more and more work regarding spatial data, I’m trying to write some posts about querying spatial data.

 
Yet another source of information???
If you want to start with spatial data, and you try to find information about this subject, you’ll probably encounter the same problem I had: there’s too much information. Almost all examples are based on real-life polygons and spatial data, and in my opinion way over-engineered to start with or to answer basic questions.

That’s why I’ll try to cover the basics in this post, and maybe create a series about spatial data, without trying to reinvent the wheel.

 
What is spatial data?

Spatial data represents information about the physical location and shape of geometric objects

If we analyse this quote taken from TechNet, it tells us that every object can be visualized with spatial data? As a matter of fact, yes you can!

Every object around you, ranging from a tree to a city or country region, can be represented by one of 3 basic types. The image below (from MSDN) shows these 3 basic types:

And from these 3 types (or collections of types) you can create every spatial object you want.

 
Geometry vs Geography
In SQL Server you have the option to use 2 spatial types. The main difference between these 2, is that Geometry only stores 2D objects, and Geography can store 3D (and even 4D) objects. Also, Geometry and Geography don’t support the same exact methods. For example, Geography can’t calculate a center-point of a shape.

Another big difference is that Geometry calculates straight lines, and Geography actually compensates for the curvature of the earth.

In order to keep things simple, I created the examples in this post all in Geometry data.

Drawing your first object
To start off basic, what is easier then drawing a simple square? A square consists of 4 coordinates, and is one of the most basic forms you can draw. An example of a square looks like this:

DECLARE @Square geometry;
SET @Square = geometry::STGeomFromText('LINESTRING (0 0, 0 100, 100 100, 100 0, 0 0)', 4326);
SELECT @Square

But what if you want a solid square, instead of an outline? In that case, you need to change the type you’re drawing into a polygon. Where the 4 lines in the example above just draw the outline of the object, a polygon (like the example below) will also contain everything within the lines you draw:

DECLARE @SquareFilled geometry;
SET @SquareFilled = geometry::STGeomFromText('POLYGON((0 0, 0 100, 100 100, 100 0, 0 0))', 4326);
SELECT @SquareFilled

 
Layers
Okay, let’s take this one step further. You can also draw multiple objects in one context. These objects can be drawn next to each other, or on top of each other. Every object you draw will be drawn in a “separate layer”. Objects that don’t overlap are just 2 shapes (polygons). But if you draw 2 shapes on top of each other, it’s a whole different story. Both objects can actually aggregate into 1 big shape, or exclude each other. First, an example with 2 separate shapes:

DECLARE @Square geometry,
		@Triangle geometry;

SET @Square = geometry::STGeomFromText('POLYGON((0 0, 0 100, 100 100, 100 0, 0 0))', 4326);
SET @Triangle = geometry::STGeomFromText('POLYGON((50 50,100 150,150 50, 50 50))', 4326);

SELECT @Square
UNION ALL
SELECT @Triangle

If you run the query above, you’ll see 2 objects appear: a square and a triangle. Both object overlap at a certain point, but they’re still 2 independent shapes.

 
Layer aggregation
Until now it’s just child’s play. Now we’re getting to the exiting stuff! How about combining the 2 previous objects into one big shape?

DECLARE @Square geometry,
		@Triangle geometry;

SET @Square = geometry::STGeomFromText('POLYGON((0 0, 0 100, 100 100, 100 0, 0 0))', 4326);
SET @Triangle = geometry::STGeomFromText('POLYGON((50 50,100 150,150 50, 50 50))', 4326);

SELECT @Square.STUnion(@Triangle)

Now you’ll see that both objects merged into one single object. This is a result of “joining” 2 objects or layers. By using the extended method STUnion on one of your shapes, you can add another shape to it. So in the case, the triangle is added to the square.

 
Layer intersection
But what if you want to know the part of the polygon that intersects? So which part of object 1 overlaps object 2? You can do this by using the STIntersection method:

DECLARE @Square geometry,
		@Triangle geometry;

SET @Square = geometry::STGeomFromText('POLYGON((0 0, 0 100, 100 100, 100 0, 0 0))', 4326);
SET @Triangle = geometry::STGeomFromText('POLYGON((50 50,100 150,150 50, 50 50))', 4326);

SELECT @Square.STIntersection(@Triangle)

Or maybe you want to know which part doesn’t overlap. Then you can query the difference of both objects:

DECLARE @Square geometry,
		@Triangle geometry;

SET @Square = geometry::STGeomFromText('POLYGON((0 0, 0 100, 100 100, 100 0, 0 0))', 4326);
SET @Triangle = geometry::STGeomFromText('POLYGON((50 50,100 150,150 50, 50 50))', 4326);

SELECT @Square.STSymDifference(@Triangle)

 
Center
As you see, there are many really cool things you can do with spatial data. One other I want to show you is how to determine the center of your object. Instead of calculating it yourself, you can use a method called STCentroid:

DECLARE @Square geometry;

SET @Square = geometry::STGeomFromText('POLYGON((0 0, 0 100, 100 100, 100 0, 0 0))', 4326);

SELECT @Square
UNION ALL
SELECT @Square.STCentroid().STBuffer(10)

Just to keep it visual, I’ve added a buffer to the center point. What STBuffer does, is adding a radial to the selected object. So in this case, it created a radial around the center point.

If you didn’t draw that extra radial, it would literally just be a pixel on your screen. So by adding a buffer around the center, it’s still visible. But it’s only for visual purposes, and isn’t required to make this query work.

 
Other possibilities
Another thing I wanted to show, is a really awesome polygon made by one of my colleagues: Johannes Tedjaatmadja (@JTedjaatmadja). You have to see it for yourself, because posting it would spoil the surprise. You can download it from here. And I must say, this’ll be one to make Mladen Prajdic (Blog | @MladenPrajdic) proud! ;)

Get column headers without retrieving data

A few days ago Pinal Dave (Blog | @pinaldave) retweeted an old blog post about retrieving columns without data. Reading this blog post, I started thinking of other ways to achieve this.

One of the downsides of using SET FMTONLY ON is that it’s a session-setting. This means that every query you run in that session, doesn’t return results. This could end up in very confusing situations. You could end up debugging a query which doesn’t return any result, and eventually you discover that this is a result of SET FMTONLY ON. So this may not be the preferred way to get the column headers from a specific query. Also, this option doesn’t return estimated or actual execution plans!

An example of such a query could be:

SET FMTONLY ON

SELECT *
FROM sys.tables

SET FMTONLY OFF

Another way of getting this information, without changing your connection options, is a simple WHERE clause.

SELECT *
FROM sys.tables
WHERE 1 = 0

This query will return the same results as the query with SET FMTONLY ON. And looking at performance, there’s no real difference between the two. I ran statistics on both solutions, and the I/O generated is pretty neglectable.

Conclusion
My personal favorite is the WHERE clause. This is because it doesn’t involve session modifiers, and (if you want to) still returns an execution plan. The SET FMTONLY ON doesn’t return an estimated or actual execution plan.

On the other hand, SET FMTONLY is on the list of deprecated features in future SQL Server versions. So for all new functionality, I’d advise you not to use it.

But maybe the other options suits you more. As always: it depends. Depending on your own preferences and the specific situation. But if you have any other options, please let me know. Post a comment, send me an e-mail, tweet me, whatever you want :) .

Calculating Running Totals

When you work with data and database systems, eventually you need to calculate running totals on (for example) product sales or financial data. There are several methods to calculate these amounts. In this post I’ll try to show the pros and cons to the different solutions.

Let’s start with creating the resources for the examples. The most basic example I could think of, is one with only the necessary information: Date and Turnover.

CREATE TABLE Dough
	(Date DATE,
	 Turnover FLOAT)

And then insert some data:

INSERT INTO Dough
	(Date, Turnover)
VALUES
	('2011-01-01', 1000),
	('2011-02-01', 1250),
	('2011-03-01', 1500),
	('2011-04-01', 1750),
	('2011-05-01', 2000),
	('2011-06-01', 2250),
	('2011-07-01', 2250),
	('2011-08-01', 2000),
	('2011-09-01', 1750),
	('2011-10-01', 1500),
	('2011-11-01', 1250),
	('2011-12-01', 1000)

INSERT INTO Dough
	(Date, Turnover)
VALUES
	('2012-01-01', 100),
	('2012-02-01', 125),
	('2012-03-01', 150),
	('2012-04-01', 175),
	('2012-05-01', 200),
	('2012-06-01', 225),
	('2012-07-01', 225),
	('2012-08-01', 200),
	('2012-09-01', 175),
	('2012-10-01', 150),
	('2012-11-01', 125),
	('2012-12-01', 100)

With this resource, we can start on the examples.

Different solutions


When looking at this question, you’ll notice that there are more solutions to return the correct result. The following queries return the same result, but all the solutions are written for a specific version of SQL Server.

SQL 2000
If you’re using SQL Server 2000 (and I certainly hope you don’t have to anymore ;) ), you can use the query with the INNER JOIN. This can be used on all SQL Server versions:

SELECT
	A.Date,
	A.Turnover,
	SUM(B.Turnover)		AS RunningTotal
FROM Dough A
INNER JOIN Dough B
	ON YEAR(B.Date) = YEAR(A.Date)
	AND B.Date <= A.Date
GROUP BY A.Date, A.Turnover
ORDER BY A.Date ASC

SQL 2005
In SQL Server 2005 they entered a new join type, called CROSS JOIN:

SELECT
	A.Date,
	A.Turnover,
	SUM(B.Turnover)		AS RunningTotal
FROM Dough A
CROSS JOIN Dough B
WHERE YEAR(B.Date) = YEAR(A.Date)
AND B.Date <= A.Date
GROUP BY A.Date, A.Turnover
ORDER BY A.Date ASC

The example with the INNER JOIN and the CROSS JOIN generate the same execution plan.

SQL 2012
With the release of SQL Server 2012 they handed us (SQL developers) a whole new “bag of tricks”. One of these “tricks” is the window function.

The first time I saw the window function, was at a Techdays NL 2012 session. This session was hosted by Bob Beauchemin (Blog | @bobbeauch). The sessions (T-SQL improvements in SQL Server 2012) is worth watching. Even if you’re using SQL Server 2012 already!

With the window function you can compute and group data, and this is done with the rows you specify.

SELECT
	Date,
	TurnOver,
	SUM(TurnOver) OVER (PARTITION BY YEAR(Date)
						ORDER BY Date ASC
						ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)		AS RunningTotals
FROM Dough

Performance


Seeing all these different solutions for the same question, I (and you probably will too) wonder about the performance of these queries. One very quick conclusion: they all return the same records ;) .

When using SET STATISTICS IO, you can see the amount of disk activity generated by your statement. If you run this for the queries above, you will get the following results:

INNER JOIN:

Table ‘Dough’. Scan count 2, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

CROSS JOIN:

Table ‘Dough’. Scan count 2, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

OVER:

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Dough’. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In the OVER query, you see a table called “Worktable”. This is an “extra object” that is generated by SQL Server because you use the OVER statement.

Conclusion


As shown above, there are several different ways to get to the same result. In this example I didn’t show you the cursor solution. This because it’s a bad practice, a bad performer, and a little bit to preserve my good name ;) . If you do want to see this, please leave me a comment, and I’ll add it to this post.

But with every solution you’ll see as much discussion about reasons to use is, as discussions on why NOT to use it. And in this case, you might be bound to a specific SQL Server version, so you can’t use a specific approach.

But if you ask me for my opinion, I’ll go for the last option. Not only because I’ve got the privilege to work with SQL Server 2012 in my daily work, but also because it’s the best performer and you’ll end up with the most readable code.

I’m guessing you have a totally different opinion, so please leave a comment with your ideas and/or approaches to this challenge! Also, comments and questions are also greatly appreciated!

Sample Databases – Ye Olde Way!

Last week I was working on a SQL Server presentation, to explain the basic of databases and how SQL Server works to a few colleagues. At the end of my presentation, I wanted to show some demo queries. Normally I would create my own tables with sample data, but I want to give them the opportunity to repeat the demos again on their own.

Nowadays Microsoft offers you the AdventureWorks database as extra download for all new versions of SQL Server. But for some examples I just want a smaller database. In “Ye Olde Days” I worked with Pubs and Northwind. Those were small databases, that were still understandable for starters. My first encounter with SQL Server was on the pubs database, and it still sticks to me as “fun and easy”.

But if you try to find them, you need to download an MSI file that extracts the files to your local system. It contains the .MDF and .LDF file of both the Pubs and Northwind databases, and a ReadMe file. But if you try to attach these databases to a SQL 2012 instance, you’ll get an error. SQL 2000 databases can’t be automatically converted to be SQL 2012 compatible.

I’m glad that they decided to add the create script to the .MSI installer. There’s only 1 thing that doesn’t work if you run the scripts. Both scripts contain a call to sp_dboption. This is a way to change database options in SQL 2000-2008. This is removed in SQL 2012, and MSDN advises you to remove this functionality as soon as possible if you still use it in old systems. So after deleting these from the script, it works perfect. One thing I added for my own use, is after the databases are created, I set them to Read-Only. You can delete this from the script, or undo this after the generation of the database(s).

I’ve also included the ERD (Entity-Relationship Diagram) for both databases. This makes it a little bit easier to start using these databases. I found the diagrams by searching in Google for the name. In this case DataMasker hosted the files I wanted.

The reason to share these scripts is because I’m probably not the only one that still wants to use these databases occasionally. So you can download them by clicking the links below. If you want a backup or the .MDF and .LDF files of the databases, please contact me and we’ll work something out.

Pubs:
PubsCreation Script (.sql)
Pubs ERD (.pdf)

Northwind:
Northwind Creation Script (.sql)
Northwind ERD (.pdf)

SET TEXTSIZE – Good or Evil?

One of the first things I try do to every morning when I’m at the office (besides getting a lot of coffee to get the engine started), is reading up on blogposts that were posted the night before or when I’m at the office. My goal is to try to learn at least 1 thing every day, by reading a blog post or article.

Today, one of those articles was written by Pinal Dave (Blog | @pinaldave). He wrote a blogpost about SET TEXTSIZE. I wasn’t familiar with that functionality, so I decided to take it out for a spin.

What SET TEXTSIZE does, is limit the size of the data returned by a SELECT statement. As Pinal describes in his blog post, it could be used as a replacement for the LEFT function on each column you retrieve from the database. But I agree: use it only for test purposes. If used in production, in a query that returns (for example) 5 columns, the SET TEXTSIZE is overlooked much easier then 5 LEFT functions. This reduces the chance that you or your colleagues wonder why the returned column value isn’t shown correctly.

The other remark I need to make, is that it’s interpreted differently by the SQL engine. A few examples of this can be found in the comments of the article Pinal wrote.

But when I used SET TEXTSIZE, I started wondering what this will do to your execution plan. According to MSDN TEXTSIZE is set at execute or run time, and not at parse time. But what does this mean for your execution plan?

To try this out, I created a table, and inserted 10.000 records in that table:

CREATE TABLE RandomData
	(ID INT IDENTITY(1,1),
	 Col1 VARCHAR(MAX),
	 Col2 VARCHAR(MAX),
	 Col3 VARCHAR(MAX),
	 Col4 VARCHAR(MAX),
	 Col5 VARCHAR(MAX))


INSERT INTO RandomData
	(Col1, Col2, Col3, Col4, Col5)
SELECT
	REPLICATE('1234567890', 100),
	REPLICATE('1234567890', 100),
	REPLICATE('1234567890', 100),
	REPLICATE('1234567890', 100),
	REPLICATE('1234567890', 100)

GO 10000

Once you’ve created the table, you can run the “old fashioned” script with the LEFT functions:

SELECT
	LEFT(Col1, 10),
	LEFT(Col2, 10),
	LEFT(Col3, 10),
	LEFT(Col4, 10),
	LEFT(Col5, 10)
FROM RandomData

If you look at the exection plan, it contains a table scan, Compute Scalar (that computes the new values of each row), and the select of the data. Nothing out of the ordinary I would say.

But if you run the same query with the SET TEXTSIZE, it results in an error:

An error occurred while executing batch. Error message is: Error processing execution plan results. The error message is:
There is an error in XML document (1, 6).
Unexpected end of file while parsing Name has occurred. Line 1, position 6.

The query actually returns the whole set 10.000 records, and the result is correct. Of every column, only the first 10 characters are returned. So what’s happening with the execution plan?

If you use either one of the statements below in your session, you can see that the execution plan is generated without any issues:

SET SHOWPLAN_TEXT ON
SET SHOWPLAN_ALL ON
SET SHOWPLAN_XML ON

There is a Connect item for this issue, but the SQL Server team decided not to fix it in SQL Server 2008. And looking at my screen, they didn’t fix it in SQL Server 2012 either…

So my best guess (without knowing what the actual code does), is that the execution plan XML isn’t completely transfered to the client. This is part of the resultset, and thus also delimited because of the SET TEXTSIZE.

So my conclusion would be: don’t use SET TEXTSIZE, unless you’re absolutely aware that the results you receive are delimited and that visualising your execution plan may cause an error (but only in the SSMS!). The query results are retrieved and shown correctly, but the execution plan XML is causing problems when using a small TEXTSIZE.

But if my conclusions are incorrect, or if I’ve overlooked something, I’d love to hear your comments on it! So don’t hesitate to correct me if necessary! ;)

T-SQL Toolbelt – Search for objects in databases – V 2.1.0

A few weeks ago, I received a message from an old colleague and friend Eric (Blog | @saidin). He wanted to know if I had a query in my (and I quote) “magic bag of SQL tricks”, to search through object in SQL server. The company he works for (he is a software developer, and independant consultant) wanted to change all stored procedures, that contained functionality to calculate VAT (Value Added Tax).

I remembered that a few years ago, I needed that same functionality, and I wrote a few scripts to search for specific dependencies in views and stored procedures. Next to a query that gets information from
sys.tables and sys.columns, I used these queries to get view and stored procedure content:

SELECT *
FROM sys.syscomments
WHERE text LIKE '%<SearchTerm>%'


SELECT *
FROM INFORMATION_SCHEMA.VIEWS
WHERE VIEW_DEFINITION LIKE '%<SearchTerm>%'

The first query uses information from sys.syscomments. Which, according to MSDN:

“Contains entries for each view, rule, default, trigger, CHECK constraint, DEFAULT constraint, and stored procedure within the database. The text column contains the original SQL definition statements.

The seconds query uses INFORMATON_SCHEMA, that contains SQL Server metadata (see MSDN article):

An information schema view is one of several methods SQL Server provides for obtaining metadata.

The VIEWS view (a view on all views?) returns a row for each view that can be accessed by the current user, in the current database. So this means that the view only returns rows for objects that you have permissions on.

Then I decided to write a script that does this in one query, and more… When working on this script, I thought about adding more functionality to it. Why wouldn’t you want to search for primary or foreign key columns, triggers, functions, etc? But adding more information to the resultset often means that the overview is lost along the way. Because of that I created a switch system. By setting a few bits you can turn on what you need to see, and turn off what you don’t want to see. This way the result is kept clean, and you’re not bothered with unnecessary information.

One of the issues I ran into is how to search for a specific string. Because I wanted to let the user enter the searchterm once, I needed to use a variable. But if you use a variable, and you ad a wildcard (%) as the first and last character, the query returns all objects. It has the same effect as returning all objects, instead of returning objects based on a specific searchterm.

So because of this, I used dynamic SQL to search through the list of objects. In dynamic SQL it’s possible to work with wildcards in a like statement. The only thing I needed to do is change one more table from memory to physical temp table, because it’s used in the dynamic SQL. Apparently dynamic SQL can’t use a memory table (DECLARE @Object TABLE) as a datasource.

So this is what I could build in the past few weeks. The only problem is that fixing issues that I found resulted in adding more and more new functionality to the script. With that in mind, I want to create a toolbelt with useful SQL scripts for myself. But of course, I want to share it with the community, so they can use it if they like.

So the upcoming weeks, I hope to build as much functionality in this script as I can. There are still a few wishes for the future, and a few features that I want to build in, just because they’re cool! For every new version, I will write a blog with releasenotes, so you’re aware of the changes in the script.

For further questions and wishes, please contact me via twitter or this blog. I’d love to hear your ideas and wishes, so that I can implement it in the script!

You can download the script by clicking on the image below.

Downloads

Version 2.1.0:


Follow

Get every new post delivered to your Inbox.