Write readable and high-performance queries with Window Functions

In SQL Server 2008, we gained a new and powerful feature in our bag of T-SQL tricks: Window Functions. The actual Window Function is the OVER clause, that allows you to determine partitions or “groups” in your query, before applying another function. In practice, this means you can create groups in your dataset, that can be molded further by applying functions to that groups.

In the past, I’ve written a number of posts about Window Functions in SQL Server:

Row_Number: Unique ID in select statement
Calculating Running Totals
Removing duplicates in a dataset

But there’s more!

 
Finding missing numbers in sequence
How many times did you look at a table, and you noticed that one of the values in a sequence is missing? Or a date range in a table, where someone deleted a few records. So all of a sudden, your year doesn’t have 365 days (or 366 in a leap year, or 365.242199 days which is the years average), but 315 days. There go your management reports that are calculated on a per day average. So how do you find the missing dates, without having to write a cursor or create a full blown date table like your BI colleagues do? You guessed right: a query with Window Function!

Let’s start with declaring a table, and insert a sequence with missing values:

DECLARE @Sequence TABLE
  (Number INT)


INSERT INTO @Sequence
  (Number)
VALUES
  (1),
  (10),
  (7),
  (4),
  (2),
  (8),
  (5)

 
So how can we get the “nearest numbers” from that table with just a single select statement?

SELECT
  LAG(Number, 1, 0) OVER (ORDER BY Number) AS LAG_Value,
  Number,
  LEAD(Number, 1, 0) OVER (ORDER BY Number) AS LEAD_Value
FROM @Sequence AS S
ORDER BY Number ASC

 
The LAG and LEAD are standard t-sql functions from sql server 2012 on. These functions give you the opportunity to access the previous or next row, without the need for a so called “self-join”. So what you see is the number, the record preceding that value (LAG) and the following value. So in this case, number 2 is preceded by Number 1, and followed by Number 4.

 
The lemonade stand
Now let’s look at another example. How about you? When you grew up, you wanted to save money for a new mobile phone right? In my case it was either a disc man, a Walkman, or a stereo set. But let’s stick with the modern equivalent of the Walkman for now: the MP3 player. So to earn money for the MP3 player, our fictitious friend who is called Joe, decides to start a lemonade stand. He needs to save up at least $150 to buy a new MP3 player. So every glass of lemonade he sells is accounted for, and at the end of the day he sums up all his
earnings, and puts it into a table:

DECLARE @Profit TABLE
  (DayNumber INT,
   Sales DECIMAL(10,2))


INSERT INTO @Profit
  (DayNumber, Sales)
VALUES
  (1,  6.90),
  (2,  4.17),
  (3,  2.69),
  (4,  7.26),
  (5,  2.93),
  (6,  8.98),
  (7,  7.25),
  (8,  5.88),
  (9,  1.51),
  (10, 7.97),
  (11, 3.44),
  (12, 3.76),
  (13, 9.96),
  (14, 0.92),
  (15, 8.28),
  (16, 6.05),
  (17, 9.40),
  (18, 4.03),
  (19, 9.14),
  (20, 7.25),
  (21, 0.06),
  (22, 9.12),
  (23, 7.39),
  (24, 6.57),
  (25, 4.54),
  (26, 0.09),
  (27, 4.42),
  (28, 9.53),
  (29, 5.09),
  (30, 0.89)

 
So as you can see, he earns quite a lot of money this way! But because he’s eager to buy his new MP3 player, he wants to see his day totals, and the amount he needs to buy his new toy. But because Joe is a really smart guy, he doesn’t want to do this with a lot of self-joins, and he wants his results fast. So looking at performance, what is the easiest way to query this data? How about this:

DECLARE @Goal DECIMAL(10,2) = 150.00


SELECT
  DayNumber,
  Sales,
  @Goal - SUM(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
He declares a “goal” variable, that is set to the amount he needs for his new gadget. So for every row, we calculate the sum of that day, plus all the sales he made from the first day (UNBOUNDED PRECEDING) and today (CURRENT ROW). After day 28 he has earned enough to buy his MP3 player. But now he wants to know what his average sales were. So he calculates the average of his sales, based on every sale he’s made so far:

SELECT
  DayNumber,
  Sales,
  AVG(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
So where does it stop?
So now that we’ve seen the SUM and AVERAGE option, what do we have left? How far can we take this? Thinking about it, how about a daily checkup if we hit a lowest or highest Sales amount? We can do this with the MIN and MAX option on the same query:

SELECT
  DayNumber,
  Sales,
  MIN(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS LowestSale
FROM @Profit


SELECT
  DayNumber,
  Sales,
  MAX(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS HighestSale
FROM @Profit

 
Now Joe can save his daily sales in the evening, and check if he had a very good, or a very bad day.

 
Not only for the average Joe…
So how can the business profit from all this? In some cases, Window Functions make it easier to output a certain resultset. In some cases it even gives you a whole new way to output data with a well performing (single) query, that was impossible until now. So if you’re running SQL Server 2008 or higher, start using (or at least start exploring) Window Functions right away!

 

To generate the random floats, I’ve used the generator of FYIcenter.com

If you want to read more about this topic, don’t forget to check out these blog posts:

- Julie Koesmarno: ABC Classification With SQL Server Window Function
- Mickey Stuewe: A Date At The End of The Month
- Chris Yates: Windows functions who knew

It’s the little things that make a difference

I still can get enthusiastic when I discover tiny new features of SQL Server Management Studio (SSMS). It’s the tool that I use every day, but still I discover new and cool things to use, that I never noticed before.

 
Filter Object Explorer
If you look at the Object Explorer, you’ll see a little button that can come in quite handy: the filter button:

 
This allows you to filter the results in Object Explorer. If you click on it, you will see the filter window pop up:

 
Just as a test, I filtered my object on name contains “Test”. This will filter only the database and object type you selected. In my case, I selected the Table-node, and it will filter only these objects:

 
There are a few drawbacks on this. One of them is that you can’t remove the filter without opening the filter pop-up again. Another one is you only filter results once. If you want to adjust your filter, you need to remove it completely, and reapply your new filter.

 
Scripting Magic
One of the things I use on a regular basis is the “Generate and Publish Scripts” wizard. But did you know this wizard had some hidden gems? One of the options you have, it to script objects to individual files. That’s an option that is hidden in plain sight:

 
But another gem is hidden behind the Advanced button:

 
This allows you to generate insert scripts for you tables, without the use of a 3rd party tool.

 
Splitter Bar
One of the hidden gems I wanted to show you, isn’t one I found out myself. This one I discovered via a blog post from Kendra Little (Blog | @Kendra_Little). She blogged about the Splitter Bar in SSMS, which is quite handy sometimes! Go check out her other blog posts as well, for example about Scripting changes from the GUI.

 
In the picture above you see the same stored procedure (in this case from the AdventureWorks 2012 database), split in 2 by the splitter bar. This makes it for example easier to look at the declare statements in the top of the script, and the query your working on in the bottom of the script.

 
Tab Groups
The last one is one I use on a regular basis. At the past few employers, I’ve worked with 2 monitors. This makes it easy to compare files or result sets. But what if you don’t have that luxury? There’s an option in SSMS to create a new tab group. Just right-click a query tab, and choose the option you like:

 
Let’s say you want to compare the resultsets of 2 queries, you can use the Horizontal Tab Group option:

 
To return to your normal view, just right-click on the tab again, and click “Move to Previous Tab Group”.

 
Save time
One of the things Mickey Stuewe (Blog | @SQLMickey) pointed out, is that you can rearrange the columns in the result window of SSMS. Just drag and drop columns the way you like. It could save you a lot of time rerunning the query to change the order of your columns. The order of the columns is reset the next time you run the query.

 
Never stop learning!
There are plenty more hidden gems in SSMS, waiting to be found by you. So never stop learning, and always try to take it a step further than needed. You’ll be surprised to see what awaits you…

How SQL Search saves you time

Since 1999, Red Gate Software has produced ingeniously simple and effective tools for over 500,000 technology professionals worldwide. From their HQ in Cambridge UK, they create a number of great tools for MS SQL Server, .NET, and Oracle. The philosophy of Red Gate is to design highly usable, reliable tools that solve the problems of DBAs and developers.

Every year Red Gate selects a number of active and influential community members (such as popular blog writers and community site owners) as well as SQL and .NET MVPs who are experts in their respective fields, to be part of the Friends of Red Gate (FORG) program. I’m proud to announce that I’m part of the 2014 FORG selection. This post is a part of a series of post, in which I try to explain and show you why the tools of Red Gate are so loved by the community.



 
Red Gate? No thank you!
One of the biggest prejudices of the tools from Red Gate is: you have to sell one of your kidneys, in order to afford one of their tools. I agree with you, some of the tools Red Gate sells are pretty expensive. Especially when you need to buy them yourself. But what if I tell you they pay for themselves in the long run? You don’t believe me? Okay, let’s start of with a free tool to convince you.

 
How SQL Search can save your bacon
As a DBA and SQL Server developer, one of your biggest challenges is to memorize your whole environment, and learn every line of T-SQL there is to find from the top of your head. I’m sorry? Oh, you don’t do that? Well, that doesn’t really come as a surprise. And if you thought: “Hey! I’m doing that too!”, stop wasting your time! No DBA, wherever you will look in the world, will EVER remember all of the T-SQL script, stored procedures, views, etc, that can be found in his or her environment. But where do they get their information from? What would you say if I told you you could search though all objects in a specific database, or on a specific instance?

So how do you do that? You’re planning on making your own script to search through the system view sys.columns for column names, and sys.procedures for Stored Procedure text? There is an easy way out, you know!

 
Installing and testing
In order to create a test case that you can repeat on your own SQL Server, I’m using the AdventureWorks2012 database. That’s a free example database that you can download from CodePlex. I’ve chosen to download the backup file, that I restored on my local SQL Server.

The installation of SQL Search can be found on the Red Gate SQL site. When you installed SQL Search, a button will be added to the SQL Server Management Studio (SSMS):

 
So, what do you actually get? If you click the button in SSMS, you’ll see a new window, that’ll act as a result pane. The top of the new screen looks like this:

 
This screen contains all possible options you will need to find objects either on your server, or in a specific database. You see a textbox on the left, where you fill in your search term. The next few options can be used as a filter. You can search on an exact match (checkbox), search on a specific object only (dropdown list), and on a specific database (dropdown list). The last dropdown list is the instance you want to search on. This list is populated with the open instance connections from your object explorer.

 
Test case
So how does this work in practice? One of your colleagues comes up to you, and asks you what objects in your database are related to department data. You could search for documentation in order to answer that question, or you could let SQL Search give you the answer. So, let’s search the AdventureWorks2012 for objects that are related to, or contain department data. The result will look like this:

 
As you will see there are 18 objects that are related to your search term (the count of object is visible at the bottom of the search results on the right). Some objects are shown multiple times, because there are multiple matches on that object. For example the “vEmployeeDepartment” view. The name of the view contains our search term, one of the columns is called department, and the text of the view (create script) contains your search string.

But how does this work with real life situations? How many times do you get the question from your colleagues how many objects are related to a specific table or column? As a DBA you probably get this question more than you would like. Your developers want to rebuild an application, or add a new feature to it, but they’re not sure if the database change they’ll make will break another applications or processes.

It’s also possible for you to use the tool to your own advantage. For example, when you want to know which object can update a specific employee record. You just search for both terms, and you’ll see that there are 3 stored procedures that update employee information:

 
Please hold on, while I search for the object…
…is a sentence you never have to use again when working with SQL Search. Whenever you found the object you were looking for, you can just double click on it, or use the button between the result pane, and the script pane:

This will look up the object in the object explorer. So you never have to look for an object after you found it. Just let SQL Search do all the hard work.

 
Red Gate? They’re AWESOME!!!
Hopefully you changed your mind about Red Gate tools, after reading this article. This is one of the tools I personally use on a daily basis. Even though there is documentation within your company, you still need to find it. It’s printed and laying around somewhere, or on an intranet or SharePoint. You know where you can find it, except when you REALLY need it!

SQL Search is also a tool you (in my opinion) really need when you’re new to a company. You’ll see a lot of different databases, with different purposes, and maybe even different DBA’s responsible for that specific database. Using SQL Search gives you a great advantage when you need to have a chat with the DBA. You’ll step into their office with a little more knowledge of the database, without reading endless documents, cryptic ERD’s and folders full of unnecessary documentation.

 
Feedback
When you start using the tool, don’t forget to thank the people from Red Gate. They LOVE to hear your feedback, either in a tweet, the Red Gate Forums, or by contacting support. You could also send me a mail or tweet, or leave a comment at the bottom of this post. I would love to answer your questions (as far as I can), or pass them on to Red Gate.

If you want to read more about SQL Search, don’t forget to check out these blog posts:

- Julie Koesmarno: SQL Tools Review: SQL Search
- Mickey Stuewe: On a SQL Quest using SQL Search by Red Gate
- Chris Yates: Headache + Pain Red Gates SQL Search

Friends of Red Gate Project

A few weeks ago I received the official confirmation that I’ll be a part of the Friends of Red Gate 2014 program. This program allows the participants to directly communicate with the product teams, provide feedback to teams, and try out beta-builds of tools.

 


 
The Red Gate tools are easy to use, and give you the upper hand on administering servers and databases. So together with other Friends of Red Gate, we decided to write a series of blog posts about the tools. The first one will be posted tomorrow. In these blog posts, we will try to show you why you should use their tools, and what problems they can solve for you.

 
The group that will write posts just like me consists of 3 other people:

 
Mickey Stuewe (Blog | @SQLMickey) from Orange County, USA
Chris Yates (Blog | @YatesSQL) from Kentucky, USA
Julie Koesmarno (Blog | @MsSQLGirl) from Canberra, Australia

 
So when you’re interested in reading about the tools from Red Gate, or you want to see how people use the same tools in different jobs, you should definitely check out this series!

T-SQL Tuesday #51 – Place Your Bets

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Place Your Bets”. If you want to read the opening post, please click the image below to go to the party-starter: Jason Brimhall (Blog | @sqlrnnr).



 
When I read about this months T-SQL Tuesday topic, the first thing that came to mind was things that you know will go wrong sooner or later. When you encounter a situation like this, you immediately know this can’t last forever. You want to fix it when you see it, but there’s no money, or there’s no time at that moment. But they promise you, in a few weeks you can take all the time you need. Well, that’ll never happen. Until things go wrong, and you can clean up the mess. Sounds familiar? Yes, we’ve all seen this, or will see this sooner or later.

 
With power comes great responsibility
Just imagine this with me. One of your colleagues asks you to look at a problem he’s having with a script someone in your company wrote. You probably solved it while he was standing right next to you. He watches you solve the problem, and when it’s solved, he walks away with a thousand-yard stare in his eyes. You don’t really think about it when it happens, but it’ll come to you…

A few weeks later, it’s 10 AM and you’re still having your first coffee of the day, the same developer asks you to look at “his script”. Wait, what?! Yes, he watched you work your magic, and that funny language of “Es-Que-El” seemed easy to learn. So he bought himself a “SQL Server for dummies”, learned all he needs to know in only a weekend, and wonders why it took you so long to learn it. From now on, he can write his own scripts, so he doesn’t need you anymore. Except for this last time.

Opening the script scares you: it’s a cursor. But in your frustration and amazement you “fix” the broken script, by refactoring his select statement in the cursor. Because the cursor only collects data, you add a “TOP 10″ clause in the select statement, and run the script as test. Nice, it finishes is 25 seconds. “It will only consume 500 rows” is the last thing you heard him say. You send the guy off, so you can continue your own work.

Later in the day, it’s about 4 PM, you meet the same guy at the coffee machine. He starts a discussion about how he needs a new PC, because the script YOU wrote is slow (see where this is going…?). It’s running for about 4 hours now, while it should only collect about 500 records. I know what you think: that’s impossible. You walk with him to his desk, stop the script, and look at his code. That isn’t the query you looked at this morning. Asking your colleague about it explains it all: he “slightly refactored” the script, because he didn’t need al those weird statements to get him his results. Well, after a fiery discussion of a few minutes, you explain him the DOES need the “FETCH NEXT” in the query, because the query now ran the same statement for only the first record in the select statement you declared for your cursor.

So this funny “Es-Que-El” language, isn’t that easy to learn. A beautiful quote about that, and I’m not sure who said that, says: “T-SQL is easy to learn, but hard to master”. So putting your money on one horse, in this case buying yourself a book, isn’t a good idea.

 
Putting your money on one color
Another great example is a company that had a wonderful Business Intelligence environment. They used the whole nine yards: SQL Server, SSIS, SSAS, SSRS, etc. The downside of that you ask? It was all hosted on 1 physical machine, on a single SQL Server instance. Oh, and it was running low on disk space, and there was no room in the chassis to put in extra disks. That’s right: it was like juggling burning chainsaws with only one hand. Or an interesting challenge, if you will.

Eventually we hosted a few databases on NAS volumes. At that point, I was told the databases we moved were less important. Pro tip: never EVER trust them when they say that!!! They forgot to tell me the biggest database of the moved databases wasn’t in the backup plan (500 GB database takes a long time to backup), and the last backup was made over a year ago. Surprise, one night the network card failed for maybe only a microsecond, and SQL Server thought the LUN was offline or the disk crashed. So SQL Server said that the database was corrupt, and that the datafiles were unavailable. After a few hours, a reboot of the server fixed it, and SQL Server could see the disk volumes again. So the database was saved after all.

But you see where I’m going with this? You never know when things go wrong, and putting all your money on one color when playing roulette isn’t the best idea. If the hardware of your single server fails, you fail.

 
Next, Next, Finish?
But the biggest example I can give you of a bad placed bet, are companies that work with SQL Server, but don’t hire a DBA. Have you ever worked for a company that work with Oracle? Every single company that works with Oracle, has a dedicated Oracle DBA. But have you ever wondered why that isn’t the case when a company works with SQL Server?

Thinking about it, I guess this is because a successful SQL Server installation is only a few “Next, Next, Finish”-mouse clicks away. So if the installation is so easy, every developer or person with IT experience can administer it probably. They couldn’t be more wrong. You know that, I know that, every SQL Server professional knows that, but try to convince other people of that fact.

So the worst bet you can place, and this is how I write myself back to the subject of this month, is not hiring a professional to manage your data and data stores. You wouldn’t let your local baker fix your car, because the wrote some books about cars, right? So why do you let a developer with basic knowledge near your SQL Server? Just because real DBA’s cost money? Yes, we do cost some serious money. But in the end, at least when you hire a GOOD DBA, they will make you money. You don’t think so? What does a DBA cost per hour? And how much money do you lose when your servers are down for just an hour?

SSIS – Remove empty rows from Excel import

From the first time that I started SSIS, I started to love it. In most cases it’s easy to create a package, easy to understand, and even readable for people who don’t “speak fluent SQL”. But what if you want to perform an easy task, and the result isn’t what you expect?

 
Formatting an Excel sheet
One of the most basic tasks you can create in SSIS, is importing an Excel sheet. Most of the time this works like a charm. But in my case, I wanted to filter out some rows from the workbook.

The business delivers an Excel sheet, that needs to be imported into the database. But because they don’t have the technical knowledge we have, they don’t know how important the format of the file is. They sent us this file (I’ve created a smaller sample, so it’s easier to read and understand):

 
The first thing you’ll notice as a data-professional is the 2 empty rows in the sheet. Beside that, we have an employee without a name. You know this is going to cause problems when you see it. These errors are easy to spot in the example, but imagine if these 2 rows are hidden in a dataset with 50.000 or more rows. So even though they might ended up there accidentally, your process is going to fail.

 
When you add an “Excel Source” to your package, and you look at the preview of that import, you immediately see the problem:

 
Table structure
In order to determine what columns can be left blank, and what columns can’t be NULL, I looked at the table structure:

CREATE TABLE ResultSSIS
  (ID INT IDENTITY(1, 1),
   FullName VARCHAR(50) NOT NULL,
   Department VARCHAR(50) NULL,
   EmployeeNumber INT NOT NULL)

 
So in the dataset, FullName and EmpolyeeNumber are mandatory, and Department is optional. With this in mind, I started to work on a way to exclude those rows.

 
Import without excluding
The first thing I tried is to import the file, and see what the results are. Because I knew the data wasn’t correct, I didn’t want to import the Excel sheet into a SQL Server database just yet. So as a destination, I used the “recordset destination” control in SSIS. Importing the data into this memory table also allowed me to use the “data viewer” to see the imported data, without the need to truncate a table after each run. You can enable the “data viewer” by right-clicking the import-connector (the arrow between controls), and click “Enable Data Viewer”:

 
If you run the SSIS package in debugging mode, you’ll see the data that is imported in a pop-up window:

 
As you can see in the screenshot above, the records with NULL values in it are included in this import. So which records do we want to exclude, based on our table structure?

 
So from the 6 records in the Excel sheet, we want to exclude 3 in our import because of NULL values. But how do we do that? The easiest way to solve it, is to import it into a temp table, delete the NULL records, and insert the other records in the destination table. But what if that isn’t possible, and you want to filter the records in your import? I’ve chose to use the “Conditional Split”.

 
Conditional Split
You don’t have to rebuild your whole package, when you want to exclude records with the “Conditional Split”. You can just add this control, at least in this case, in between you source file and your destination. If you open the control, you can add an expression that is used to filter records. In my case, I wanted to exclude the rows with an empty “FullName” and “EmployeeNumber”:

 
When connecting your “Conditional Split” to your destination, SSIS will ask you what output the “Conditional Split” needs to return. To output the entire set without the empty rows, chose the “Conditional Split Default Output”:

 
When you run your package with the extra “Conditional Split” (and you enable Data Viewer again), you’ll see the filtered output of the “Conditional Split”. The 3 NULL records are excluded like expected:

 
Conclusion
SSIS is easy to use, and yet a a really powerful tool. Even if you build your processes in SSIS, it’s not always necessary to rebuild your whole package. Sometimes you can save the day with just a minor change. That’s the power of SSIS!

Backup and relax?

Keeping a good backup strategy is a must for every DBA and database developer. But just creating a backup isn’t enough. Perhaps you don’t have enough storage to store the full backups of your database for over a week. Or taking a full backup of a database takes so long, it’s only possible on weekends. So what are your options?

 
RPO and RTO
Your whole backup strategy starts by determining the RPO (Recovery Point Objective) and RTO (Recovery Time Objective). The great Brent Ozar (Blog | @BrentO) wrote a great blog post about these terms, and explains what they mean.

Basically it means that you need to determine what maximum data loss is allowed, and from there you start creating a backup strategy. So how do you determine these? This is how RPO and RTO look like if you visualize them:

 
Storage
Another thing you want to consider is the storage available for your backups. Most of the time the backups will be stored on a NAS (Network-attached storage), and not on the local server, so storage isn’t a big issue in that case.

I’ve also seen companies that created the backup on the local server, and after completion copied it to a network location. In my opinion it’s only one more dependency that you could prevent, but other than that it’s a valid option.

 
Backup types
SQL Server supports multiple backup options. They all have their pros and cons, and give you the ability to create a backup strategy that fits your needs. In this blog post, I’m assuming the database that we work with is created as a full recovery model database.

 
Full backup
A full backup takes a backup of the entire database. With this backup file you’ll be able to recover the entire database, without the need of extra log files. Creating a full backup can take more time, depending on the size of the database. Let’s visualize this with an example:

 
Looking at the example you’ll see that every night a full backup is created. But on Friday night 8 PM the database crashes, and we need to recover from backup. The last full backup was taken 20 hours ago, so those 20 hours of data changes are lost.

 
Differential backup
If you have less time to spend on backing up your database every night, one of your options is to take a differential backup. A differential backup only backs up data that is changed since the last full backup. A differential backup can’t be created without taking a full backup first. If you try to create it without a full copy of the database, SQL Server will throw an error:

 
When you create a full backup, a backup chain is started. This means that SQL Server registers which LSN (Log Sequence Number) was added to the last backup. When you take the next backup, the backup will contain all transactions from the last LSN of the previous backup until the time of your new backup.

A backup chain can’t be started with a differential backup. Also, when you want to restore a differential backup, you need the full backup it’s based on. To put this into perspective, look at the visualization below:

 
At midnight on Monday we create a full backup of the database. Every other day we create a differential backup at midnight. On Friday at 8 PM the database crashes, and we need to restore a backup. All we need is the full backup from Monday morning, and differential backup 4. Although it takes less time to create a differential backup, you see that this hasn’t helped you much. You still lost 20 hours of data.

 
Transaction Log backup
The last major type of backup is a transaction log backup. This backup contains all transactions that were executed after the last full or differential backup were created. This gives you the opportunity to perform the so called “point-in-time recovery”.

Just like the differential backup can’t be created without a full backup, a transaction log backup can’t be created without a full or differential backup first. So the transaction log backup can’t be used to start a backup chain. Let’s take the same example, and add a transaction log backup every 15 minutes (the blue lines represents the transaction log backups):

 
If the database crashed on the same time as the previous examples, your data loss is slimmed down from 20 hours to a maximum of 15 minutes. In order to recover your database, you need the Full backup created on Monday, the differential backup created on Friday, and all transaction log backups created after the differential backup at midnight. So if the database crash occurs a few seconds before the next transaction log backup, the maximum data loss is 15 minutes. Again, without a full or differential backup you can’t create a transaction log backup:

 
Pitfalls
Whenever you create a database, backup that database, and throw away that backup file, you can create a differential or transaction log backup. SQL Server doesn’t require the last full or differential backup (in case of a transaction log backup) to be present. So remember to always check if there is a valid backup available, either on the server or on your backup location.

 
Backup compression
From SQL Server 2008 onward, you can use a new feature called Backup Compression. Whether or not you’re compressing your backup can also make a big difference in performance. A compressed backup is smaller, so it requires less I/O when created, and can increase backup speed significantly. On the other hand, compressing a backup increases CPU usage. So it’s a tradeoff you need to consider. But in some cases, this could solve the problem of having a shortage on storage.

 
Files and Filegroups backup
Creating a file or filegroup backup can be practical when performance or database size is an issue for you. Perhaps taking a backup of your 500GB databases takes to long, and you need to consider other options.

You can backup all filegroups separately, but it’s also possible to combine a number of filesgroups in a single backup. This makes it easier to balance your backups over several disks when you’d like to. But perhaps it’s easier to create a backup that consists of multiple files. This can be achieved by adding more destination files at the bottom of the “create a backup”-GUI. SQL Server than balanced the data across the files you added.

Adding more destination files to your backup can also increase performance. Jes Schultz Borland (Blog | @grrl_geek) wrote a great article about that. She tested several options to see what the impact on performance is.

 
Copy-only backup
One of the most important backup options (in my opinion) is the copy-only backup. This allows you to create an ad-hoc backup without breaking the backup chain.

A copy-only backup works independently from any previous backup or backup plan. So a copy-only backup will not take the last LSN into account, or store the last LSN added to the copy-only backup you’re creating. So if you have a backup plan in place, and you or one of your colleagues needs to create an ad-hoc backup, copy-only is the way to go.

 
Now can we relax?
The short answer is: NO! Your work has only just begun. Now that you have a backup strategy, you need to build it, test it, tune it, and cherish it. Creating the perfect backup strategy isn’t a silver bullet. Your databases change, your processes change, your colleagues change…

So when is the last time you tried to restore a backup from your automated process? You can’t remember? Time to get it done than! You know what they say: A DBA is only as good as his last restore. So if you want to keep working as a DBA for your company, start preparing a test restore now.

 
And then…?
Once you’ve created a backup strategy, the hard work just begins. How are you implementing your backups? Are you planning on using the default SQL Server maintenance plans? Are you building something yourself with SQL Server Agent Jobs and SSIS packages? Maybe you want to buy a solution from a specific vendor you like or know? Well, what about a free solution?

If you’re looking for a cheap way out, building it yourself is a good option. But why not look at the completely free solution by the new MVP Ola Hallengren (Website | @olahallengren)?

It’s a solution used by many of our community members, and won a lot of prizes over the years. Not sure if it’s safe to use? Why don’t you look at the list of companies that use his solution, or read the blog post of Jonathan Kehayias (Blog | @SQLPoolBoy) about it.

Another great resource to start from is the TechNet page about backups. This contains a lot of information about the techniques behind the backup process, and the possible pitfalls you’ll encounter.

 
Conclusion
When creating a backup strategy, you need to take a lot of factors into account. What kind of hardware are you working with? Is the storage you need available? Is it possible to create a full backup every night, or only on weekends?

After building your (custom) solution, you need to spend time on tuning and maintaining it. Your databases aren’t static, and will change every second, every minute, every day. So keep changing your process to perform at it’s best, and in the end, you will create your own free time to spend on cool things.

Follow

Get every new post delivered to your Inbox.

Join 34 other followers