Become a T-SQL Hero with SQL Prompt

Since 1999, Red Gate Software has produced ingeniously simple and effective tools for over 500,000 technology professionals worldwide. From their HQ in Cambridge UK, they create a number of great tools for MS SQL Server, .NET, and Oracle. The philosophy of Red Gate is to design highly usable, reliable tools that solve the problems of DBAs and developers.

Every year Red Gate selects a number of active and influential community members (such as popular blog writers and community site owners) as well as SQL and .NET MVPs who are experts in their respective fields, to be part of the Friends of Red Gate (FORG) program. I’m proud to announce that I’m part of the 2014 FORG selection. This post is a part of a series of post, in which I try to explain and show you why the tools of Red Gate are so loved by the community.



 
What SSMS misses
The tool that Microsoft provides you with when you install SQL Server is pretty nice. It’s nicely designed (even though I’ve heard other opinions), it’s stable, and it does what it should do: it allows you to administer your servers. But that’s not the only thing that it should do in my opinion. If you take a look at Visual Studio as an example, that studio contains more options that helps developers do their job. And remember, SQL Server Management Studio (SSMS) is actually a Visual Studio instance with a different layout (just check the Ssms.exe.config)…

So why doesn’t SSMS have a schema compare option, like Visual Studio has? Visual Studio is no longer the environment that is used only by developers that work with ASP.NET and C#, but it evolved to much more the last few years. It’s now the tool for working with Data Quality Services (DQS) and SQL Server Integration Services (SSIS). So let’s talk about some other features that SSMS misses in my opinion, and let’s see how SQL Prompt can fill that gap.

 
IntelliSense
SSMS ships with a default intelliSense, but this isn’t an implementation that I would like to see. It misses a few vital features. For example, the fact that SSMS IntelliSense doesn’t take relations between objects into account, is one of the biggest shortcomings. One of the companies that created a tool to fix that is Red Gate. If you install SQL Prompt, you get IntelliSense 2.0, or IntelliSense on steroids if you like.

When you installed SQL Prompt, it gives you suggestions when you write a JOIN clause. This means that it scans column names, and traces primary- and foreign key relationships on the tables you are joining. The join suggestion based on keys can be recognized by the little key symbol in front of it:

 
Object discovery
Whenever you’re working in a database, and you’re writing your queries, there comes a point that you can’t remember a column name or datatype. In SSMS you need to navigate the object explorer to the object (let’s say a table), and generate a create script, or click on the table to get to the column list. SQL Prompt allows you to hover your mouse over an object, and see some extra information:

 
If you click on the popup, you’ll get another popup window with the creation script (by default), or a summary of the object:

 
Scripting options
Whenever you need to script an object, or want to see the contents of for example a Stored Procedure, you need to navigate to the object in your object explorer. With SQL Prompt, you can also use the mouse context menu to script objects. Just right-click an object you referenced in your query, and choose the “Script Object as ALTER” option:

 
This will generate an alter script for the object you selected. This makes it a lot easier to see the contents of a Stored Procedure or View, and change it when needed.

 
Useful functions
The last feature I want to show you is the menu of SQL Prompt. This shows you another set of useful tools and functions. For example, how do you format your T-SQL query? SQL Prompt can do that for you with a few mouse clicks, or if you press the hotkey combination. Another great feature is the “Find Unused Variables and Parameters”. This saves you time when you try to find out which declared variables you don’t use anymore, in a very large query. All of these options can be found in the SQL Prompt menu:

 
If you want, you can also create a style-export for all your colleagues, so your entire department or company formats queries according to the same layout. You can find out more about this in the SQL Prompt menu, under Options -> Format -> Styles. You can export your formatting options as a .sqlpromptstyle file, or import one.

 
Is it worth it?
If you would ask me, my answer would be: yes! Even though it’ll cost you about €285,- (or $390,-), it’s definitely worth it. It saves you a lot of time, and it adds a lot of useful (and needed) features to SSMS.

If you want to try it out, just go to Red-Gate.com, or the product site for SQL Prompt. You can download a trial there that contains all features, for a limited time.

 
If you want to read more about this topic, don’t forget to check out these blog posts:

- Julie Koesmarno: Clean And Tidy SQL With SQL Prompt
Mickey Stuewe: Becoming a SQL Prompt Power User
Chris Yates: SQL Prompt – The Power Within

Write readable and high-performance queries with Window Functions

In SQL Server 2008, we gained a new and powerful feature in our bag of T-SQL tricks: Window Functions. The actual Window Function is the OVER clause, that allows you to determine partitions or “groups” in your query, before applying another function. In practice, this means you can create groups in your dataset, that can be molded further by applying functions to that groups.

In the past, I’ve written a number of posts about Window Functions in SQL Server:

Row_Number: Unique ID in select statement
Calculating Running Totals
Removing duplicates in a dataset

But there’s more!

 
Finding missing numbers in sequence
How many times did you look at a table, and you noticed that one of the values in a sequence is missing? Or a date range in a table, where someone deleted a few records. So all of a sudden, your year doesn’t have 365 days (or 366 in a leap year, or 365.242199 days which is the years average), but 315 days. There go your management reports that are calculated on a per day average. So how do you find the missing dates, without having to write a cursor or create a full blown date table like your BI colleagues do? You guessed right: a query with Window Function!

Let’s start with declaring a table, and insert a sequence with missing values:

DECLARE @Sequence TABLE
  (Number INT)


INSERT INTO @Sequence
  (Number)
VALUES
  (1),
  (10),
  (7),
  (4),
  (2),
  (8),
  (5)

 
So how can we get the “nearest numbers” from that table with just a single select statement?

SELECT
  LAG(Number, 1, 0) OVER (ORDER BY Number) AS LAG_Value,
  Number,
  LEAD(Number, 1, 0) OVER (ORDER BY Number) AS LEAD_Value
FROM @Sequence AS S
ORDER BY Number ASC

 
The LAG and LEAD are standard t-sql functions from sql server 2012 on. These functions give you the opportunity to access the previous or next row, without the need for a so called “self-join”. So what you see is the number, the record preceding that value (LAG) and the following value. So in this case, number 2 is preceded by Number 1, and followed by Number 4.

 
The lemonade stand
Now let’s look at another example. How about you? When you grew up, you wanted to save money for a new mobile phone right? In my case it was either a disc man, a Walkman, or a stereo set. But let’s stick with the modern equivalent of the Walkman for now: the MP3 player. So to earn money for the MP3 player, our fictitious friend who is called Joe, decides to start a lemonade stand. He needs to save up at least $150 to buy a new MP3 player. So every glass of lemonade he sells is accounted for, and at the end of the day he sums up all his
earnings, and puts it into a table:

DECLARE @Profit TABLE
  (DayNumber INT,
   Sales DECIMAL(10,2))


INSERT INTO @Profit
  (DayNumber, Sales)
VALUES
  (1,  6.90),
  (2,  4.17),
  (3,  2.69),
  (4,  7.26),
  (5,  2.93),
  (6,  8.98),
  (7,  7.25),
  (8,  5.88),
  (9,  1.51),
  (10, 7.97),
  (11, 3.44),
  (12, 3.76),
  (13, 9.96),
  (14, 0.92),
  (15, 8.28),
  (16, 6.05),
  (17, 9.40),
  (18, 4.03),
  (19, 9.14),
  (20, 7.25),
  (21, 0.06),
  (22, 9.12),
  (23, 7.39),
  (24, 6.57),
  (25, 4.54),
  (26, 0.09),
  (27, 4.42),
  (28, 9.53),
  (29, 5.09),
  (30, 0.89)

 
So as you can see, he earns quite a lot of money this way! But because he’s eager to buy his new MP3 player, he wants to see his day totals, and the amount he needs to buy his new toy. But because Joe is a really smart guy, he doesn’t want to do this with a lot of self-joins, and he wants his results fast. So looking at performance, what is the easiest way to query this data? How about this:

DECLARE @Goal DECIMAL(10,2) = 150.00


SELECT
  DayNumber,
  Sales,
  @Goal - SUM(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
He declares a “goal” variable, that is set to the amount he needs for his new gadget. So for every row, we calculate the sum of that day, plus all the sales he made from the first day (UNBOUNDED PRECEDING) and today (CURRENT ROW). After day 28 he has earned enough to buy his MP3 player. But now he wants to know what his average sales were. So he calculates the average of his sales, based on every sale he’s made so far:

SELECT
  DayNumber,
  Sales,
  AVG(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
So where does it stop?
So now that we’ve seen the SUM and AVERAGE option, what do we have left? How far can we take this? Thinking about it, how about a daily checkup if we hit a lowest or highest Sales amount? We can do this with the MIN and MAX option on the same query:

SELECT
  DayNumber,
  Sales,
  MIN(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS LowestSale
FROM @Profit


SELECT
  DayNumber,
  Sales,
  MAX(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS HighestSale
FROM @Profit

 
Now Joe can save his daily sales in the evening, and check if he had a very good, or a very bad day.

 
Not only for the average Joe…
So how can the business profit from all this? In some cases, Window Functions make it easier to output a certain resultset. In some cases it even gives you a whole new way to output data with a well performing (single) query, that was impossible until now. So if you’re running SQL Server 2008 or higher, start using (or at least start exploring) Window Functions right away!

 

To generate the random floats, I’ve used the generator of FYIcenter.com

If you want to read more about this topic, don’t forget to check out these blog posts:

- Julie Koesmarno: ABC Classification With SQL Server Window Function
Mickey Stuewe: A Date At The End of The Month
Chris Yates: Windows functions who knew

It’s the little things that make a difference

I still can get enthusiastic when I discover tiny new features of SQL Server Management Studio (SSMS). It’s the tool that I use every day, but still I discover new and cool things to use, that I never noticed before.

 
Filter Object Explorer
If you look at the Object Explorer, you’ll see a little button that can come in quite handy: the filter button:

 
This allows you to filter the results in Object Explorer. If you click on it, you will see the filter window pop up:

 
Just as a test, I filtered my object on name contains “Test”. This will filter only the database and object type you selected. In my case, I selected the Table-node, and it will filter only these objects:

 
There are a few drawbacks on this. One of them is that you can’t remove the filter without opening the filter pop-up again. Another one is you only filter results once. If you want to adjust your filter, you need to remove it completely, and reapply your new filter.

 
Scripting Magic
One of the things I use on a regular basis is the “Generate and Publish Scripts” wizard. But did you know this wizard had some hidden gems? One of the options you have, it to script objects to individual files. That’s an option that is hidden in plain sight:

 
But another gem is hidden behind the Advanced button:

 
This allows you to generate insert scripts for you tables, without the use of a 3rd party tool.

 
Splitter Bar
One of the hidden gems I wanted to show you, isn’t one I found out myself. This one I discovered via a blog post from Kendra Little (Blog | @Kendra_Little). She blogged about the Splitter Bar in SSMS, which is quite handy sometimes! Go check out her other blog posts as well, for example about Scripting changes from the GUI.

 
In the picture above you see the same stored procedure (in this case from the AdventureWorks 2012 database), split in 2 by the splitter bar. This makes it for example easier to look at the declare statements in the top of the script, and the query your working on in the bottom of the script.

 
Tab Groups
The last one is one I use on a regular basis. At the past few employers, I’ve worked with 2 monitors. This makes it easy to compare files or result sets. But what if you don’t have that luxury? There’s an option in SSMS to create a new tab group. Just right-click a query tab, and choose the option you like:

 
Let’s say you want to compare the resultsets of 2 queries, you can use the Horizontal Tab Group option:

 
To return to your normal view, just right-click on the tab again, and click “Move to Previous Tab Group”.

 
Save time
One of the things Mickey Stuewe (Blog | @SQLMickey) pointed out, is that you can rearrange the columns in the result window of SSMS. Just drag and drop columns the way you like. It could save you a lot of time rerunning the query to change the order of your columns. The order of the columns is reset the next time you run the query.

 
Never stop learning!
There are plenty more hidden gems in SSMS, waiting to be found by you. So never stop learning, and always try to take it a step further than needed. You’ll be surprised to see what awaits you…

How SQL Search saves you time

Since 1999, Red Gate Software has produced ingeniously simple and effective tools for over 500,000 technology professionals worldwide. From their HQ in Cambridge UK, they create a number of great tools for MS SQL Server, .NET, and Oracle. The philosophy of Red Gate is to design highly usable, reliable tools that solve the problems of DBAs and developers.

Every year Red Gate selects a number of active and influential community members (such as popular blog writers and community site owners) as well as SQL and .NET MVPs who are experts in their respective fields, to be part of the Friends of Red Gate (FORG) program. I’m proud to announce that I’m part of the 2014 FORG selection. This post is a part of a series of post, in which I try to explain and show you why the tools of Red Gate are so loved by the community.



 
Red Gate? No thank you!
One of the biggest prejudices of the tools from Red Gate is: you have to sell one of your kidneys, in order to afford one of their tools. I agree with you, some of the tools Red Gate sells are pretty expensive. Especially when you need to buy them yourself. But what if I tell you they pay for themselves in the long run? You don’t believe me? Okay, let’s start of with a free tool to convince you.

 
How SQL Search can save your bacon
As a DBA and SQL Server developer, one of your biggest challenges is to memorize your whole environment, and learn every line of T-SQL there is to find from the top of your head. I’m sorry? Oh, you don’t do that? Well, that doesn’t really come as a surprise. And if you thought: “Hey! I’m doing that too!”, stop wasting your time! No DBA, wherever you will look in the world, will EVER remember all of the T-SQL script, stored procedures, views, etc, that can be found in his or her environment. But where do they get their information from? What would you say if I told you you could search though all objects in a specific database, or on a specific instance?

So how do you do that? You’re planning on making your own script to search through the system view sys.columns for column names, and sys.procedures for Stored Procedure text? There is an easy way out, you know!

 
Installing and testing
In order to create a test case that you can repeat on your own SQL Server, I’m using the AdventureWorks2012 database. That’s a free example database that you can download from CodePlex. I’ve chosen to download the backup file, that I restored on my local SQL Server.

The installation of SQL Search can be found on the Red Gate SQL site. When you installed SQL Search, a button will be added to the SQL Server Management Studio (SSMS):

 
So, what do you actually get? If you click the button in SSMS, you’ll see a new window, that’ll act as a result pane. The top of the new screen looks like this:

 
This screen contains all possible options you will need to find objects either on your server, or in a specific database. You see a textbox on the left, where you fill in your search term. The next few options can be used as a filter. You can search on an exact match (checkbox), search on a specific object only (dropdown list), and on a specific database (dropdown list). The last dropdown list is the instance you want to search on. This list is populated with the open instance connections from your object explorer.

 
Test case
So how does this work in practice? One of your colleagues comes up to you, and asks you what objects in your database are related to department data. You could search for documentation in order to answer that question, or you could let SQL Search give you the answer. So, let’s search the AdventureWorks2012 for objects that are related to, or contain department data. The result will look like this:

 
As you will see there are 18 objects that are related to your search term (the count of object is visible at the bottom of the search results on the right). Some objects are shown multiple times, because there are multiple matches on that object. For example the “vEmployeeDepartment” view. The name of the view contains our search term, one of the columns is called department, and the text of the view (create script) contains your search string.

But how does this work with real life situations? How many times do you get the question from your colleagues how many objects are related to a specific table or column? As a DBA you probably get this question more than you would like. Your developers want to rebuild an application, or add a new feature to it, but they’re not sure if the database change they’ll make will break another applications or processes.

It’s also possible for you to use the tool to your own advantage. For example, when you want to know which object can update a specific employee record. You just search for both terms, and you’ll see that there are 3 stored procedures that update employee information:

 
Please hold on, while I search for the object…
…is a sentence you never have to use again when working with SQL Search. Whenever you found the object you were looking for, you can just double click on it, or use the button between the result pane, and the script pane:

This will look up the object in the object explorer. So you never have to look for an object after you found it. Just let SQL Search do all the hard work.

 
Red Gate? They’re AWESOME!!!
Hopefully you changed your mind about Red Gate tools, after reading this article. This is one of the tools I personally use on a daily basis. Even though there is documentation within your company, you still need to find it. It’s printed and laying around somewhere, or on an intranet or SharePoint. You know where you can find it, except when you REALLY need it!

SQL Search is also a tool you (in my opinion) really need when you’re new to a company. You’ll see a lot of different databases, with different purposes, and maybe even different DBA’s responsible for that specific database. Using SQL Search gives you a great advantage when you need to have a chat with the DBA. You’ll step into their office with a little more knowledge of the database, without reading endless documents, cryptic ERD’s and folders full of unnecessary documentation.

 
Feedback
When you start using the tool, don’t forget to thank the people from Red Gate. They LOVE to hear your feedback, either in a tweet, the Red Gate Forums, or by contacting support. You could also send me a mail or tweet, or leave a comment at the bottom of this post. I would love to answer your questions (as far as I can), or pass them on to Red Gate.

If you want to read more about SQL Search, don’t forget to check out these blog posts:

- Julie Koesmarno: SQL Tools Review: SQL Search
Mickey Stuewe: On a SQL Quest using SQL Search by Red Gate
Chris Yates: Headache + Pain Red Gates SQL Search

SSIS – Remove empty rows from Excel import

From the first time that I started SSIS, I started to love it. In most cases it’s easy to create a package, easy to understand, and even readable for people who don’t “speak fluent SQL”. But what if you want to perform an easy task, and the result isn’t what you expect?

 
Formatting an Excel sheet
One of the most basic tasks you can create in SSIS, is importing an Excel sheet. Most of the time this works like a charm. But in my case, I wanted to filter out some rows from the workbook.

The business delivers an Excel sheet, that needs to be imported into the database. But because they don’t have the technical knowledge we have, they don’t know how important the format of the file is. They sent us this file (I’ve created a smaller sample, so it’s easier to read and understand):

 
The first thing you’ll notice as a data-professional is the 2 empty rows in the sheet. Beside that, we have an employee without a name. You know this is going to cause problems when you see it. These errors are easy to spot in the example, but imagine if these 2 rows are hidden in a dataset with 50.000 or more rows. So even though they might ended up there accidentally, your process is going to fail.

 
When you add an “Excel Source” to your package, and you look at the preview of that import, you immediately see the problem:

 
Table structure
In order to determine what columns can be left blank, and what columns can’t be NULL, I looked at the table structure:

CREATE TABLE ResultSSIS
  (ID INT IDENTITY(1, 1),
   FullName VARCHAR(50) NOT NULL,
   Department VARCHAR(50) NULL,
   EmployeeNumber INT NOT NULL)

 
So in the dataset, FullName and EmpolyeeNumber are mandatory, and Department is optional. With this in mind, I started to work on a way to exclude those rows.

 
Import without excluding
The first thing I tried is to import the file, and see what the results are. Because I knew the data wasn’t correct, I didn’t want to import the Excel sheet into a SQL Server database just yet. So as a destination, I used the “recordset destination” control in SSIS. Importing the data into this memory table also allowed me to use the “data viewer” to see the imported data, without the need to truncate a table after each run. You can enable the “data viewer” by right-clicking the import-connector (the arrow between controls), and click “Enable Data Viewer”:

 
If you run the SSIS package in debugging mode, you’ll see the data that is imported in a pop-up window:

 
As you can see in the screenshot above, the records with NULL values in it are included in this import. So which records do we want to exclude, based on our table structure?

 
So from the 6 records in the Excel sheet, we want to exclude 3 in our import because of NULL values. But how do we do that? The easiest way to solve it, is to import it into a temp table, delete the NULL records, and insert the other records in the destination table. But what if that isn’t possible, and you want to filter the records in your import? I’ve chose to use the “Conditional Split”.

 
Conditional Split
You don’t have to rebuild your whole package, when you want to exclude records with the “Conditional Split”. You can just add this control, at least in this case, in between you source file and your destination. If you open the control, you can add an expression that is used to filter records. In my case, I wanted to exclude the rows with an empty “FullName” and “EmployeeNumber”:

 
When connecting your “Conditional Split” to your destination, SSIS will ask you what output the “Conditional Split” needs to return. To output the entire set without the empty rows, chose the “Conditional Split Default Output”:

 
When you run your package with the extra “Conditional Split” (and you enable Data Viewer again), you’ll see the filtered output of the “Conditional Split”. The 3 NULL records are excluded like expected:

 
Conclusion
SSIS is easy to use, and yet a a really powerful tool. Even if you build your processes in SSIS, it’s not always necessary to rebuild your whole package. Sometimes you can save the day with just a minor change. That’s the power of SSIS!

Backup and relax?

Keeping a good backup strategy is a must for every DBA and database developer. But just creating a backup isn’t enough. Perhaps you don’t have enough storage to store the full backups of your database for over a week. Or taking a full backup of a database takes so long, it’s only possible on weekends. So what are your options?

 
RPO and RTO
Your whole backup strategy starts by determining the RPO (Recovery Point Objective) and RTO (Recovery Time Objective). The great Brent Ozar (Blog | @BrentO) wrote a great blog post about these terms, and explains what they mean.

Basically it means that you need to determine what maximum data loss is allowed, and from there you start creating a backup strategy. So how do you determine these? This is how RPO and RTO look like if you visualize them:

 
Storage
Another thing you want to consider is the storage available for your backups. Most of the time the backups will be stored on a NAS (Network-attached storage), and not on the local server, so storage isn’t a big issue in that case.

I’ve also seen companies that created the backup on the local server, and after completion copied it to a network location. In my opinion it’s only one more dependency that you could prevent, but other than that it’s a valid option.

 
Backup types
SQL Server supports multiple backup options. They all have their pros and cons, and give you the ability to create a backup strategy that fits your needs. In this blog post, I’m assuming the database that we work with is created as a full recovery model database.

 
Full backup
A full backup takes a backup of the entire database. With this backup file you’ll be able to recover the entire database, without the need of extra log files. Creating a full backup can take more time, depending on the size of the database. Let’s visualize this with an example:

 
Looking at the example you’ll see that every night a full backup is created. But on Friday night 8 PM the database crashes, and we need to recover from backup. The last full backup was taken 20 hours ago, so those 20 hours of data changes are lost.

 
Differential backup
If you have less time to spend on backing up your database every night, one of your options is to take a differential backup. A differential backup only backs up data that is changed since the last full backup. A differential backup can’t be created without taking a full backup first. If you try to create it without a full copy of the database, SQL Server will throw an error:

 
When you create a full backup, a backup chain is started. This means that SQL Server registers which LSN (Log Sequence Number) was added to the last backup. When you take the next backup, the backup will contain all transactions from the last LSN of the previous backup until the time of your new backup.

A backup chain can’t be started with a differential backup. Also, when you want to restore a differential backup, you need the full backup it’s based on. To put this into perspective, look at the visualization below:

 
At midnight on Monday we create a full backup of the database. Every other day we create a differential backup at midnight. On Friday at 8 PM the database crashes, and we need to restore a backup. All we need is the full backup from Monday morning, and differential backup 4. Although it takes less time to create a differential backup, you see that this hasn’t helped you much. You still lost 20 hours of data.

 
Transaction Log backup
The last major type of backup is a transaction log backup. This backup contains all transactions that were executed after the last full or differential backup were created. This gives you the opportunity to perform the so called “point-in-time recovery”.

Just like the differential backup can’t be created without a full backup, a transaction log backup can’t be created without a full or differential backup first. So the transaction log backup can’t be used to start a backup chain. Let’s take the same example, and add a transaction log backup every 15 minutes (the blue lines represents the transaction log backups):

 
If the database crashed on the same time as the previous examples, your data loss is slimmed down from 20 hours to a maximum of 15 minutes. In order to recover your database, you need the Full backup created on Monday, the differential backup created on Friday, and all transaction log backups created after the differential backup at midnight. So if the database crash occurs a few seconds before the next transaction log backup, the maximum data loss is 15 minutes. Again, without a full or differential backup you can’t create a transaction log backup:

 
Pitfalls
Whenever you create a database, backup that database, and throw away that backup file, you can create a differential or transaction log backup. SQL Server doesn’t require the last full or differential backup (in case of a transaction log backup) to be present. So remember to always check if there is a valid backup available, either on the server or on your backup location.

 
Backup compression
From SQL Server 2008 onward, you can use a new feature called Backup Compression. Whether or not you’re compressing your backup can also make a big difference in performance. A compressed backup is smaller, so it requires less I/O when created, and can increase backup speed significantly. On the other hand, compressing a backup increases CPU usage. So it’s a tradeoff you need to consider. But in some cases, this could solve the problem of having a shortage on storage.

 
Files and Filegroups backup
Creating a file or filegroup backup can be practical when performance or database size is an issue for you. Perhaps taking a backup of your 500GB databases takes to long, and you need to consider other options.

You can backup all filegroups separately, but it’s also possible to combine a number of filesgroups in a single backup. This makes it easier to balance your backups over several disks when you’d like to. But perhaps it’s easier to create a backup that consists of multiple files. This can be achieved by adding more destination files at the bottom of the “create a backup”-GUI. SQL Server than balanced the data across the files you added.

Adding more destination files to your backup can also increase performance. Jes Schultz Borland (Blog | @grrl_geek) wrote a great article about that. She tested several options to see what the impact on performance is.

 
Copy-only backup
One of the most important backup options (in my opinion) is the copy-only backup. This allows you to create an ad-hoc backup without breaking the backup chain.

A copy-only backup works independently from any previous backup or backup plan. So a copy-only backup will not take the last LSN into account, or store the last LSN added to the copy-only backup you’re creating. So if you have a backup plan in place, and you or one of your colleagues needs to create an ad-hoc backup, copy-only is the way to go.

 
Now can we relax?
The short answer is: NO! Your work has only just begun. Now that you have a backup strategy, you need to build it, test it, tune it, and cherish it. Creating the perfect backup strategy isn’t a silver bullet. Your databases change, your processes change, your colleagues change…

So when is the last time you tried to restore a backup from your automated process? You can’t remember? Time to get it done than! You know what they say: A DBA is only as good as his last restore. So if you want to keep working as a DBA for your company, start preparing a test restore now.

 
And then…?
Once you’ve created a backup strategy, the hard work just begins. How are you implementing your backups? Are you planning on using the default SQL Server maintenance plans? Are you building something yourself with SQL Server Agent Jobs and SSIS packages? Maybe you want to buy a solution from a specific vendor you like or know? Well, what about a free solution?

If you’re looking for a cheap way out, building it yourself is a good option. But why not look at the completely free solution by the new MVP Ola Hallengren (Website | @olahallengren)?

It’s a solution used by many of our community members, and won a lot of prizes over the years. Not sure if it’s safe to use? Why don’t you look at the list of companies that use his solution, or read the blog post of Jonathan Kehayias (Blog | @SQLPoolBoy) about it.

Another great resource to start from is the TechNet page about backups. This contains a lot of information about the techniques behind the backup process, and the possible pitfalls you’ll encounter.

 
Conclusion
When creating a backup strategy, you need to take a lot of factors into account. What kind of hardware are you working with? Is the storage you need available? Is it possible to create a full backup every night, or only on weekends?

After building your (custom) solution, you need to spend time on tuning and maintaining it. Your databases aren’t static, and will change every second, every minute, every day. So keep changing your process to perform at it’s best, and in the end, you will create your own free time to spend on cool things.

Incremental updates with Change Data Capture

When designing a database or ETL process, for example loading your production data into a reporting environment, you always start your design with performance in mind. In the beginning of the project, your scripts and ETL run blazing fast. But after a few months in production, the entire project grinds to a halt. But how do you fix that problem, without a complete redesign of your applications and database? One of the many solutions is an easy one: incrementally load your data into the destination tables.

 
Change Data Capture
Incremental data loading could be a hard nut to crack. It’s not always an option, but it might be a good point to start from. One of the ways to start loading your data incrementally, is by using the keys in your database as a reference. If your table has a column called “Modified Date”, and that is updated every time the record is updated, you could use that. Every night, when the process runs, you just add the records that were modified after the last successful process run. But what if you don’t have that possibility? Change Data Capture (CDC) is an easy way out.

 
CDC is a way to record inserts, updates and deletes on a specific table, without the need of writing the triggers yourself. CDC reads the transaction log, and captures all changes made to the specific table. These changes are stored in the associated change table, that is created by CDC.

Below I’m going to show you how to setup your first table with CDC.If you would like to know more about CDC, this TechNet article is a place to start.

 
Create an example
To show you the basics of CDC, let start with creating a table called TestCDC in the database called Sandbox:

USE Sandbox
GO

CREATE TABLE dbo.TestCDC
  (ID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
	 Descr varchar(50) NULL)
GO

 
Once you’ve created the table, turn on CDC at the database level, by execution the system stored procedure created to do that:

EXEC sys.sp_cdc_enable_db

 
There is also a system stored procedure to enable CDC on the table level. You need to enable CDC on tables manually, and separately for every table you need:

EXEC sys.sp_cdc_enable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @role_name = NULL

 
If the SQL Server Agent is running on your machine or server, you’ll see this confirmation (I’ll explain later why SQL Server Agent is needed):

 
If the Agent isn’t running, you’ll see this warning:

 
If you ran the enable table statement, you will see that SQL Server created the system objects needed to track changes in the table:

 
Because CDC uses 2 SQL Server Agent jobs to capture and cleanup the change tables, you need to run the Agent to start the data capture. If the jobs aren’t running, SQL Server won’t capture any changes made:

 
Start data changes
In order to see what happens when you change data, let’s insert some records:

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description')

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description too...')

 
And let’s update one of those 2 inserted records:

UPDATE dbo.TestCDC
SET Descr = 'UPD - ' + Descr
WHERE ID = 2

 
Now, let’s check the content of both the original table, and the change table:

/* Original table */
SELECT * FROM dbo.TestCDC

/* Change table */
SELECT * FROM cdc.dbo_TestCDC_CT

 
If you run both queries, you’ll see the resultset below:

 
The records in the CDC change table allow you to update the data in your reporting environment. You could query them yourself, by retrieving all the changes since your last update. You can also use the procedures that return those changes for you, for example the cdc.fn_cdc_get_net_changes_. You can read more about the system function here.

 
Cleaning up after an update
Now that you’ve updated your reporting environment, it’s a wise thing to cleanup your CDC data. You could also drop the records yourself with a DELETE statement. Another option is using the system procedure for that: “sys.sp_cdc_cleanup_change_table”. You can clean your data using the following SQL statement:

DECLARE @Last_LSN VARBINARY(10) =
  (SELECT MAX(cdc.dbo_TestCDC_CT.[__$start_lsn]) FROM cdc.dbo_TestCDC_CT)

EXEC sys.sp_cdc_cleanup_change_table
  @capture_instance = 'dbo_TestCDC',
  @low_water_mark = @Last_LSN,
  @threshold = 5000

 
The query will retrieve the last LSN (Log Sequence Number), and remove everything that happened before that.

 
Cleanup of CDC
If you want to completely remove CDC (because we’re done testing), you can disable it on the table level by running the query below:

EXEC sys.sp_cdc_disable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @capture_instance = 'dbo_TestCDC'

 
The statement will cleanup all the objects that were created to enable CDC on that specific table. But the statement will only stop the CDC on the specific table. The fastest way to disable CDC on all tables in the database, is disabling CDC on the database level, by running the query below:

EXEC sys.sp_cdc_disable_db

 
Conclusion
Loading data always takes time, and there are many factors that are important: your database size, your frequency of changes, your ETL process, etc. The time it costs you to move data can be changed by rewriting your process to incremental loads. CDC is one of the many ways to achieve this. It works out of the box, and doesn’t require you to build any process yourself. But maybe your environment needs a custom process to operate the way you want it to. Not every feature in SQL Server is a so called silver bullet, but sometimes it comes darn close to one…

Removing duplicates in a dataset

As most of you know, “data is only as good as its weakest record”. Or was it “The possession of facts is knowledge, the use of data is wisdom”? Anyway, data quality is one of the most important aspects of a dataset. But in practice, this is one of the first things that fails.

Data quality isn’t something you can create overnight, but it’s something that is achieved over a longer period of time. Because data is volatile, the quality of that data is becoming volatile itself. Who in your development- or administration-team is responsible for data quality? If you can’t think of anyone, you don’t have to be ashamed. In many companies data quality is underestimated.

 
Duplicates
One of the biggest mistakes that I saw at most companies I worked for, was data duplication. Developers created the database, forgot to put primary- and foreign keys on the tables, and they didn’t think of unique constraints. What do you get? Well, one of the least problems you’ll see is data duplication. Similar records being inserted twice or even more, without any constraint to prevent this.

I’m not going to tell you how to prevent duplicates, because that’s a whole other story. Start looking at primary keys, unique constraints, and (in most cases) the overall design of your database.

 
Removing duplicates
The main questions is: how do I get rid of duplicates, without trashing too much data. First of all, let’s create a small table:

CREATE TABLE #T1
  (Descr VARCHAR(10))

 
Now let’s insert some data, including some duplicates:

INSERT INTO #T1
  (Descr)
VALUES
  ('Value 1'),
  ('Value 2'),
  ('Value 2'),
  ('Value 3'),
  ('Value 4'),
  ('Value 5'),
  ('Value 6'),
  ('Value 6'),
  ('Value 6'),
  ('Value 7'),
  ('Value 8')

 
If you’d try to remove the duplicates with a normal DELETE statement, you can only delete both versions of the duplicate record. So how can we remove only one of the duplicates?

 
ROW_NUMBER to the rescue!
In order to delete one of the duplicates, we need to mark the “extra version” of the record. But how can we do this? The records are identical, right? In this case, we can use the function ROW_NUMBER to add a unique ID to the records.

ROW_NUMBER returns a sequential number for every record in the database, based on the grouping (Partition) and order you choose.

In the table we just created, you see that there are two duplicates for “Value 2″, and that the “Value 6″ record is even inserted three times. We want to completely remove these duplicates, and want to save only 1 record for each value.

Now we just add the ROW_NUMBER to our query:

SELECT
  Descr,
  ROW_NUMBER() OVER(PARTITION BY Descr
                    ORDER BY Descr)       AS RowNumber
FROM #T1

 
The result looks like this:

 
As you can see, every record has a unique number, based on the partition (which is basically a group by clause) of the “Descr” column. The order by is mandatory, but doesn’t affect the result in this example.

 
Removing the duplicates
Because ROW_NUMBER can only be used in the SELECT or ORDER BY statement, we can’t directly use it in a DELETE statement. But we can use a CTE to delete the duplicates:

WITH DeleteCTE AS
  (SELECT
      Descr,
      ROW_NUMBER() OVER(PARTITION BY Descr ORDER BY Descr) AS RowNumber
   FROM #T1)


DELETE D
FROM #T1 T
INNER JOIN DeleteCTE D
  ON D.Descr = T.Descr
WHERE D.RowNumber <> 1

 
ROW_NUMBER creates unique records for us, which we are able to delete in the following DELETE statement. We only want to keep one version, so we delete everything that has a “RowNumber” that isn’t 1.

The result of this looks like this:

 
Conclusion
Removing duplicates is something we can do very easily, and is only one single step in achieving data quality. There are many more steps to follow, before you have a trustworthy dataset, that is reliable any time of the day.

After removing the duplicates from your dataset, don’t forget to add primary- and foreign keys and unique constraint to prevent this situation in the future. Because running a script like this every week, isn’t going to solve your problems…

Data paging using offset

With every new release of SQL Server, we get to use new features that make our lives as developers and administrators so much easier. A few days ago, I came across an old piece of code (SQL 2005 if I remember correctly), that I used to page data for a CMS I build when I was still a web developer.

The company I worked for needed a new website, and wanted an HTML-editor to edit content on the website. This content was stored in SQL Server, and was retrieved by the website. With a few business rules we decided which content was visible, and which content was hidden from the website.

One of the features of the website was a news feed. But because there were so much news articles, we needed to show the top 10 articles on the first page and let the user click through to the next page of articles. But because we had so much news in the database, we needed to page this data. Every page should show 10 new articles, and we needed to switch pages for the rest of the news articles.

 
Creating the sample data
In order to show you the problem and solution, we need to create a sample table:

CREATE TABLE dbo.Paging
  (ID INT IDENTITY(1,1),
   Title VARCHAR(50),
   Content VARCHAR(50))

 
The test data we need to insert looks the same for every record:

INSERT INTO dbo.Paging
  (Title, Content)
VALUES
  ('This is an article', 'This is the content')
GO 50

 
This script will insert 50 record in the Paging table.

 
The old way
In older versions of SQL Server you needed to build your own solution to solve this problem. Let’s assume you clicked the news feed button on the website, and we want to switch to page 2 of the results. The solution I build back then looked something like this:

DECLARE @RowsToShow INT = 10,
        @RowsToSkip INT = 10


SELECT TOP(@RowsToShow)
  ID ,
  Title ,
  Content
FROM dbo.Paging
WHERE ID NOT IN
  (SELECT TOP(@RowsToSkip) ID FROM dbo.Paging)
ORDER BY ID ASC

 
In the query above, you can see we skip the first 10 rows, and retrieve the next 10 rows after that. That means that you need to remember on the website which records were retrieved already, and which records you want to see. The easiest way to do this, is by selecting the ID’s you’ve already shown, and retrieve the next set of rows.

This means you get execution plans like this:

 
The new way
From SQL Server 2012 onwards, we can use a new feature that is called OFFSET. That feature allows us to “window” our dataset, and retrieve a small subset of data without using a TOP and subquery, like in the example above. The new query would look like this:

SELECT
  ID ,
  Title ,
  Content
FROM dbo.Paging
ORDER BY ID ASC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY

 
In the query above, you can see an offset of 10 rows, and a fetch of 10 rows. This means that it skips the first 10 records, and retrieves the next 10 records after that. But how can you get this to work with dynamic resultsets and pages? This is one way to do it:

DECLARE @RowsPerPage INT = 10,
        @PageNumber INT = 2

SELECT
  ID ,
  Title ,
  Content
FROM dbo.Paging
ORDER BY ID ASC
OFFSET ((@PageNumber - 1) * @RowsPerPage) ROWS
FETCH NEXT @RowsPerPage ROWS ONLY

 
The offset is calculated by taking the @PageNumber parameter minus one to retrieve the page we want to see. If we wouldn’t do that, the offset would always skip the first 10 records. Then we multiply that number by the @RowsPerPage parameter, to calculate how many results we want to retrieve.

After that, we use the @RowsPerPage in the FETCH NEXT clause to retrieve the number of rows we want (in this case the next 10). This results in a completely different execution plan:

 
As you can see, this has a lot less impact on SQL Server. This becomes really visible if you compare both in SSMS:

 
I/O Costs
Comparing query costs is nice when you quickly compare 2 different approaches for the same solution, but in the end it all comes down to I/O costs. So which query is the fastest solution, and how are we going to test that?

First of all, we need to measure the I/O calls. We do that by using a DBCC command called DROPCLEANBUFFERS. This allows us “to test queries with a cold buffer cache without shutting down and restarting the server”. SO DON’T USE THIS IN PRODUCTION!!!

So the complete testscript looks like this:

SET NOCOUNT ON
SET STATISTICS IO ON


DBCC DROPCLEANBUFFERS

--==================================================
DECLARE @RowsToShow INT = 10,
        @RowsToSkip INT = 10


SELECT TOP(@RowsToShow)
  ID ,
  Title ,
  Content
FROM dbo.Paging
WHERE ID NOT IN
  (SELECT TOP(@RowsToSkip) ID FROM dbo.Paging)
ORDER BY ID ASC
--==================================================

DBCC DROPCLEANBUFFERS

--==================================================
DECLARE @RowsPerPage INT = 10,
        @PageNumber INT = 2

SELECT
  ID ,
  Title ,
  Content
FROM dbo.Paging
ORDER BY ID ASC
OFFSET ((@PageNumber - 1) * @RowsPerPage) ROWS
FETCH NEXT @RowsPerPage ROWS ONLY
--==================================================

So we clean the SQL Server buffers, run the first query, clean the buffers again, and run the second query. Now the effect of the new statement is really obvious, if you look at I/O costs:

 
So the old version of the query (with the sub-select) scans the table twice, and reads 51 pages from the cache. The new approach (with the OFFSET) scans the table only once, and reads only 1 page from the cache.

 
Conclusion
The less I/O calls SQL Server needs to retrieve the result from disk or cache, the faster your query will run. In this case, we’ve tuned the query from 51 pages read to 1 page read. And we’ve only tested this on a table with 50 records, but the network traffic is decreased significantly. So here’s a lot of performance improvement from only one new piece of functionality. And there is a lot more out there.

Recursive CTEs: Opening the door to a new dimension

A few weeks ago, someone asked me what a Common Table Expression (CTE) was, and how it works. But writing about a CTE would result in a short blog post. So why not make it a little more interesting, and also write about recursive CTE’s, right?

Since SQL Server 2005 we have the opportunity to use something called a Common Table Expression, or CTE for short. This T-SQL feature allows us to use the resultset of a query, as a source for another statement, just like a physical table. This sounds weird, but let me show you.

 
Creating the resources needed
To explain the possibilities of a CTE, let’s create a simple example with employee data. We’re going to create a table that contains the employee’s ID, name, job title and the employee ID of the manager:

CREATE TABLE dbo.Employees
    (EmployeeID INT,
	 EmployeeName VARCHAR(25),
     JobTitle VARCHAR(50),
     ManagerID INT)
 
 
INSERT INTO dbo.Employees
    (EmployeeID, JobTitle, EmployeeName, ManagerID)
VALUES
    (1, 'CEO', 'Smith', NULL),
    (2, 'Department Manager', 'Winters', 1),
    (3, 'Team Manager', 'Thompson', 2),
    (4, 'Employee', 'Jones', 3);

 
If you select all records from this table, it will look like this:

 
So as you can see, every employee has a manager, except the CEO. He’s at the top of the food chain and no one has a higher position than him. And looking at the “ordinary” employee, you’ll see that he has a manager above him, but no employees beneath him. He’s at the bottom of the food chain so to speak.

But what about the CTE? Let’s write an easy example to show you how to add the employees last names and job titles together, without doing that in your final select statement:

WITH EmployeeList AS
(
    SELECT
		EmployeeID,
        EmployeeName
        + ' - ' +
        JobTitle AS EmployeeNameAndJobTitle
    FROM dbo.Employees
)


SELECT
	EmployeeID,
	EmployeeNameAndJobTitle
FROM EmployeeList;

 
If we analyse the example above, the select query that adds the employees last name and job title together becomes a new resultset that is called “EmployeeList”, and can be used just like a normal table in the query below the CTE. The result looks like this”

 
But remember: the CTE just exists in the scope of your query. This means that you can only run one select statement on the CTE. If you try to run a second select statement, you’ll see that the CTE will no longer exist.

Another thing you can do with a CTE, is directly add the column names to the CTE like this:

WITH EmployeeList (EmployeeID, EmployeeNameAndJobTitle) AS
(
    SELECT
		EmployeeID,
        EmployeeName
        + ' - ' +
        JobTitle
    FROM dbo.Employees
)


SELECT
	EmployeeID,
	EmployeeNameAndJobTitle
FROM EmployeeList;

 
This might be easier to read for you and your colleagues, instead of naming your column in the select with an “AS EmployeeID”. The result is exactly the same as in the example above.

This is just an easy example, but imagine that you can use this to join your aggregated temporary dataset from your CTE, onto another table that you need to build a final resultset.

 
Recursive
But now comes the best part: recursive CTE’s. To stick with this example, what if you want to combine the job title of the employee with the manager above him? This sort of resultsets can be created with a recursive CTE. In the example below, the job title of the manager is added to the employees job title:

WITH HierarchyCTE (EmployeeID, ManagerID, JobTitle, Hierarchy)
AS
	(SELECT
		EmployeeID,
		ManagerID,
		JobTitle,
		CONVERT(VARCHAR(35), '')
	 FROM dbo.Employees
	 WHERE ManagerID IS NULL

	 UNION ALL

	 SELECT
		E.EmployeeID,
		E.ManagerID,
		E.JobTitle,
		CONVERT(VARCHAR(35), H.JobTitle + ' \ ' + E.JobTitle)
	 FROM dbo.Employees E
	 INNER JOIN HierarchyCTE H
		ON H.EmployeeID = e.ManagerID
)


SELECT
	EmployeeID,
	ManagerID,
	JobTitle,
	Hierarchy
FROM HierarchyCTE;

 
This is what the result looks like:

 
What happens in the recursive CTE, is that the first query is used as an anchor, and a loop is executed to add every child-record to the anchor (parent)-record. If there are no child-records left for the parent, the next parent is fetched, and so on.

 
So what can you do with this?
The examples above were just easy examples, but you can imagine the opportunities with this functionality. For example, if you work for an electronics store, you could build a long string of all packages with sub-deliveries (when every phone is send with a screen protector and case for example). Or create a complete overview of contents for recipes (for example, which ingredients are put into a pie). The possibilities are endless…

Follow

Get every new post delivered to your Inbox.

Join 49 other followers