Backup and relax?

Keeping a good backup strategy is a must for every DBA and database developer. But just creating a backup isn’t enough. Perhaps you don’t have enough storage to store the full backups of your database for over a week. Or taking a full backup of a database takes so long, it’s only possible on weekends. So what are your options?

 
RPO and RTO
Your whole backup strategy starts by determining the RPO (Recovery Point Objective) and RTO (Recovery Time Objective). The great Brent Ozar (Blog | @BrentO) wrote a great blog post about these terms, and explains what they mean.

Basically it means that you need to determine what maximum data loss is allowed, and from there you start creating a backup strategy. So how do you determine these? This is how RPO and RTO look like if you visualize them:

 
Storage
Another thing you want to consider is the storage available for your backups. Most of the time the backups will be stored on a NAS (Network-attached storage), and not on the local server, so storage isn’t a big issue in that case.

I’ve also seen companies that created the backup on the local server, and after completion copied it to a network location. In my opinion it’s only one more dependency that you could prevent, but other than that it’s a valid option.

 
Backup types
SQL Server supports multiple backup options. They all have their pros and cons, and give you the ability to create a backup strategy that fits your needs. In this blog post, I’m assuming the database that we work with is created as a full recovery model database.

 
Full backup
A full backup takes a backup of the entire database. With this backup file you’ll be able to recover the entire database, without the need of extra log files. Creating a full backup can take more time, depending on the size of the database. Let’s visualize this with an example:

 
Looking at the example you’ll see that every night a full backup is created. But on Friday night 8 PM the database crashes, and we need to recover from backup. The last full backup was taken 20 hours ago, so those 20 hours of data changes are lost.

 
Differential backup
If you have less time to spend on backing up your database every night, one of your options is to take a differential backup. A differential backup only backs up data that is changed since the last full backup. A differential backup can’t be created without taking a full backup first. If you try to create it without a full copy of the database, SQL Server will throw an error:

 
When you create a full backup, a backup chain is started. This means that SQL Server registers which LSN (Log Sequence Number) was added to the last backup. When you take the next backup, the backup will contain all transactions from the last LSN of the previous backup until the time of your new backup.

A backup chain can’t be started with a differential backup. Also, when you want to restore a differential backup, you need the full backup it’s based on. To put this into perspective, look at the visualization below:

 
At midnight on Monday we create a full backup of the database. Every other day we create a differential backup at midnight. On Friday at 8 PM the database crashes, and we need to restore a backup. All we need is the full backup from Monday morning, and differential backup 4. Although it takes less time to create a differential backup, you see that this hasn’t helped you much. You still lost 20 hours of data.

 
Transaction Log backup
The last major type of backup is a transaction log backup. This backup contains all transactions that were executed after the last full or differential backup were created. This gives you the opportunity to perform the so called “point-in-time recovery”.

Just like the differential backup can’t be created without a full backup, a transaction log backup can’t be created without a full or differential backup first. So the transaction log backup can’t be used to start a backup chain. Let’s take the same example, and add a transaction log backup every 15 minutes (the blue lines represents the transaction log backups):

 
If the database crashed on the same time as the previous examples, your data loss is slimmed down from 20 hours to a maximum of 15 minutes. In order to recover your database, you need the Full backup created on Monday, the differential backup created on Friday, and all transaction log backups created after the differential backup at midnight. So if the database crash occurs a few seconds before the next transaction log backup, the maximum data loss is 15 minutes. Again, without a full or differential backup you can’t create a transaction log backup:

 
Pitfalls
Whenever you create a database, backup that database, and throw away that backup file, you can create a differential or transaction log backup. SQL Server doesn’t require the last full or differential backup (in case of a transaction log backup) to be present. So remember to always check if there is a valid backup available, either on the server or on your backup location.

 
Backup compression
From SQL Server 2008 onward, you can use a new feature called Backup Compression. Whether or not you’re compressing your backup can also make a big difference in performance. A compressed backup is smaller, so it requires less I/O when created, and can increase backup speed significantly. On the other hand, compressing a backup increases CPU usage. So it’s a tradeoff you need to consider. But in some cases, this could solve the problem of having a shortage on storage.

 
Files and Filegroups backup
Creating a file or filegroup backup can be practical when performance or database size is an issue for you. Perhaps taking a backup of your 500GB databases takes to long, and you need to consider other options.

You can backup all filegroups separately, but it’s also possible to combine a number of filesgroups in a single backup. This makes it easier to balance your backups over several disks when you’d like to. But perhaps it’s easier to create a backup that consists of multiple files. This can be achieved by adding more destination files at the bottom of the “create a backup”-GUI. SQL Server than balanced the data across the files you added.

Adding more destination files to your backup can also increase performance. Jes Schultz Borland (Blog | @grrl_geek) wrote a great article about that. She tested several options to see what the impact on performance is.

 
Copy-only backup
One of the most important backup options (in my opinion) is the copy-only backup. This allows you to create an ad-hoc backup without breaking the backup chain.

A copy-only backup works independently from any previous backup or backup plan. So a copy-only backup will not take the last LSN into account, or store the last LSN added to the copy-only backup you’re creating. So if you have a backup plan in place, and you or one of your colleagues needs to create an ad-hoc backup, copy-only is the way to go.

 
Now can we relax?
The short answer is: NO! Your work has only just begun. Now that you have a backup strategy, you need to build it, test it, tune it, and cherish it. Creating the perfect backup strategy isn’t a silver bullet. Your databases change, your processes change, your colleagues change…

So when is the last time you tried to restore a backup from your automated process? You can’t remember? Time to get it done than! You know what they say: A DBA is only as good as his last restore. So if you want to keep working as a DBA for your company, start preparing a test restore now.

 
And then…?
Once you’ve created a backup strategy, the hard work just begins. How are you implementing your backups? Are you planning on using the default SQL Server maintenance plans? Are you building something yourself with SQL Server Agent Jobs and SSIS packages? Maybe you want to buy a solution from a specific vendor you like or know? Well, what about a free solution?

If you’re looking for a cheap way out, building it yourself is a good option. But why not look at the completely free solution by the new MVP Ola Hallengren (Website | @olahallengren)?

It’s a solution used by many of our community members, and won a lot of prizes over the years. Not sure if it’s safe to use? Why don’t you look at the list of companies that use his solution, or read the blog post of Jonathan Kehayias (Blog | @SQLPoolBoy) about it.

Another great resource to start from is the TechNet page about backups. This contains a lot of information about the techniques behind the backup process, and the possible pitfalls you’ll encounter.

 
Conclusion
When creating a backup strategy, you need to take a lot of factors into account. What kind of hardware are you working with? Is the storage you need available? Is it possible to create a full backup every night, or only on weekends?

After building your (custom) solution, you need to spend time on tuning and maintaining it. Your databases aren’t static, and will change every second, every minute, every day. So keep changing your process to perform at it’s best, and in the end, you will create your own free time to spend on cool things.

T-SQL Tuesday #50 – Automation: yea or nay

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Automation”. If you want to read the opening post, please click the image below to go to the party-starter: Hemanth D. (Blog | @SqlChow).



 
Being a DBA, you want to automate as many processes as you can, in order to save time that you can spend on more important things. But have you ever considered over-automating your processes?

 
We’re safe! What can go wrong here?
At one of the companies I worked for, they thought they had everything sorted out. Indexes were rebuild every day or every week (depended on the database), databases and logfiles were shrinked, databases were checked for corruption, backups were running, etc. They felt safe, knowing that if something happened they could anticipate on any situation SQL Server would throw at them. It would blow up in their faces eventually…

One of the first things I checked were the backups. The backup job was running, but only a few databases were actually selected for backup. And the biggest database (500+ GB), which was pretty important, was skipped because it took too long to backup. And guess what, they didn’t EVER test recovering from a backup, because of a lack of disk space and time. And there you have it: a false sense of safety!

I don’t have to tell you not to shrink your database and logfiles. Everybody knows that every time you shrink your database, a kitten dies… Or an index dies… Or the soul of your database… I’m not sure which one, but take your pick. It causes (and I quote Paul Randal (Blog | @PaulRandal) on this!): “*massive* index fragmentation”. Read more about that over at Paul’s blog. Besides that, if your next query needs more space in a data- or logfile you'll see more wait time because of file growth.

The indexes were rebuild every night on the important databases, and every weekend on less used databases. But they never checked if the problem they had before was fixed when switching to this solution.

Also the corruption check was run only on user databases. They never heard of running a corruption check on system databases. The system database were in the backup process, but they never took the time checked if they could restore them or were running a backup of a corrupted database.

 
Focus on the important stuff
So instead of automating all your processes, maybe you should focus on what’s really important. You just automated your backup process. But does it run every time? Are the backups actually written to disk? Can you restore one of the backups you created?

What I’m trying to say is, you can automate tasks whenever and wherever you like, but don’t forget to test them. Once you’ve automated something, plan regular tests to verify if the automated process runs the way you expect it to. And is the end result really the result you want and expect?

 
Don’t reinvent the wheel
Another tip is: don’t reinvent the wheel. There are more people that encountered the same issue, and wrote about it or logged about a solution. So before you build your own maintenance solution, or automate health status reports, check with your community members. There’s can be found help for every problem, but the checkup on that solution is all on you.

T-SQL Tuesday #45 – Follow the Yellow Brick Road

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Follow the Yellow Brick Road”. If you want to read the opening post, please click the image below to go to the party-starter: Mickey Stuewe (Blog | @SQLMickey).



 
When I read this months subject, I didn’t know what to write about, until I read one of the other blogs. So I’m sorry Mickey, but I’m using the subject this month as a guideline, and extend it a little bit. I’m glad I know Mickey, and I’m guessing she doesn’t mind me getting a little creative.

The writer of the post I read, talked about index fragmentation and data growth, and then I remembered a situation I encountered a while back.

When you come in to a company, you’d really like to see (and maybe even expect) to walk into a well organized team. A team that has procedures that work for them, and team members that agreed to solve problems in a specific way. But sometimes, it’s not…

 
Here you go!
On my first day as DBA at the company, I walked in and after a few conversations booted up my computer. Once it was booted, they gave me the name of a server, and wished me good luck. No documentation, no explanation of the databases, and no mention of who worked on the server. The only thing they said when they “handed over the key”, is “Here you go! From now on it’s your responsibility to keep the environment running and healthy!”.

So, there I was… No documentation, no health check, no one to tell me what the status of the server was…

 
Taking that first step is always the hardest
So the first thing I did was check out which databases were on the server. There were a lot: 70 in total. Then I checked the user permissions: most users were a member of the db_owner role. Great! Asking which users really needed all that permissions was useless. Everyone needed full access to everything.

So then I took the step to tell them that I was in charge of the server from that moment on, and that if they did something without my permission, I’d remove their access to the environment. Looking back, that was easier said than done.

But the first step that was really hard to accomplish, was reporting of the server health. No one thought that we needed that, and if there was something went wrong we would notice in an instant. And because I knew we really needed that, I just cleared my schedule, and created a few reports.

 
Being right is always a fun thing
A few weeks later, they finally saw I was right. One of the databases filled up a disk, and now we had issues. The only way we could see what went wrong, was looking at the data growth over the previous weeks in my reports. So now I could show them how wrong they were. It’s nice to be right once in a while!

 
Audit
This is were this months topic comes into play. A clear indication that you lack an audit trail, is when there’s an issue with one of your processes, databases, or even as small as a specific dataset, and you don’t know where the issue came from. Following your data flows from beginning to end is always a good idea. And if you work at a larger company, you’re (at least in the Netherlands) required to document your data flows for a financial audit every year.

But not only your datasets need audit trails. Your databases are in dying need of an audit trail as well. Because what happens if your database increases 50% in size over the course of a week, and you don’t track those kind of numbers? Your disk is filling up, your database stops working (if you haven’t got the space to extend your datafiles), and eventually your world is becoming a dark and evil place if you can’t fix it.

 
Conclusion
Even if you don’t have a full front-to-back audit trail of your data and processes, at least try to monitor and audit the key point in your data flow. That helps you debug excessive data growth, and helps you when (for example) a user creates a new database without asking your permissions. Even small audits help you out when you’re in a world of pain and trouble.

T-SQL Tuesday #44 – The second chance

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “The second chance”. If you want to read the opening post, please click the image below to go to the party-starter: Bradley Ball (Blog | @SQLBalls).



 
This months topic isn’t easy for me. Even while I’m writing this, I’m still thinking about what that means to me personally. Second chances sound to good to be true. Doing something for a second time, and trying a different approach then the first time, in order to succeed…

Normally I try not to be a person that looks back at previous mistakes, but sometimes that’s easier said than done. But I must say, there are not a lot of things I regret in my life. Only one that’s really close to my heart, and nobody probably knows about. But I won’t bother you with that story…

 
Deserving a second chance
People always say: everybody deserves a second chance. But I don’t completely agree with that. Everyone makes mistakes, and that’s not something to be ashamed of. But if you make the same mistake twice, you need to start wondering if there’s something you could have done to prevent it. But even then, you deserve a second chance.

The people that actually know me, know I’m a nice guy, I’m honest (sometimes even a little bit too honest), and normally you can’t get me angry. But if you screw the same things up over and over again, I’m not that friendly anymore. Let me explain that with an example.

 
No, that wasn’t me!
A while ago I worked with someone who thought he was really good at his job. Personally, I had some issues with that opinion, but I gave him the benefit of the doubt. On a number of occasions he screwed things up, and I thought he should have known that what he was doing was never ever going to work. But still, I was willing to give him that second chance. But then he got me angry. And you won’t like me when I’m angry.

There were a number of SQL Server Agent Jobs running, and they locked up some of our tables and databases. When I asked him to look at that, he said he didn’t run those jobs, and focused on his screen again. So I asked him again, nicely, to look at it. He replied with the same answer.

A little bit angry, I told him the jobs were started on the server, and that he was the only one that was logged on to the server. Then he “suddenly” remembered he started the jobs, and said the locking wasn’t that bad. As a DBA, I took a deep breath, and counted to 10, and waited for him to fix the issue. But if you’re that stubborn, you’re clearly lying to me, and don’t even have the courage to tell me you screwed up, you don’t deserve a second chance in my opinion. At least be honest with yourself and to your colleagues!

 
Honesty get’s you a second chance
At this and previous companies I worked for, I always tried to teach the student and interns they need to be honest and listen to people with experience. Even if things go wrong, and you’re the one to blame, at least tell the truth. For me, that’s the difference between fixing the issue together and moving on, or letting him take the fall all on his own. But this is also an experience I got handed down to me by my colleagues a few years back. This is what happened to me, as I remember it:

When I started my first job in IT, I was offered a job as SQL Server Consultant. That meant that I was responsible for data conversions from different systems to our core system. When I took the job, I had never written a query before. But by listening to colleagues and my mentor (a good friend of mine who worked for the same company), I made it into a development team about 1.5 years after I started my first job.

That meant I was able to access the production system (yes, that’s where the problems began!). These permission were given to me, so I could solve data related issues in production. Until the day they asked me to update 5 rows in production. I checked and double checked the T-SQL statement I wrote, asked a colleague to take a look at it, and then took a break away from my computer so I could totally focus on this task when I got back.

I sat down again, looked at the query one last time, and pressed F5… One minute passed… Two minutes passed… And then the query finished… 50.000 rows affected… I slightly panicked, and noticed I only selected the update, a half “WHERE” clause, and no “BEGIN TRAN”… My heart started racing, and I picked up the phone and asked the system administrator (a good friend of mine, who worked at a different location) if he could restore the last backup for me, because I screwed up. After some questions, and some explanations about my mistake, the last thing he said, before he hung up the phone in anger, was “Which backup? The one that didn’t ran for the last few weeks?”.

I didn’t know what to do. How could I ever fix this? Almost every record in the table was updated, and there was no way of knowing what he old values of the updated records were. So it took all my courage to pick up the phone, and ring the system administrator again. All I heard on the other side of the phone was his evil laughter. Before I could ask him what was going on, he told me: “I’m glad you were honest to me. But don’t worry, I’m restoring the backup that was taken an hour ago. There’s no data lost”.

At that moment, I didn’t know what to think or feel. At first I wanted to slap him silly, but a few minutes later I wanted to thank him for his wonderful help. He probably saved my ass, and he never told anyone except my mentor (who also was my direct manager back then, and also a good friend of us both). A few days later, the three of us talked about it face to face, and eventually all laughed about the situation.

 
A wise lesson
But if I learned anything from that situation, besides never running an update without transaction or “WHERE” clause, is to be honest. Even though you might think the company will fire you for the mistake you made, it’s always better to tell them then letting them find out themselves. And that’s what I try to tell the students, interns, and junior colleagues I work with. Be honest, and then you earn a second chance…

T-SQL Tuesday #42 – The Long and Winding Road

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “The Long and Winding Road”. If you want to read the opening post, please click the image below to go to the party-starter: Wendy Pastrick (Blog | @wendy_dance).



After I skipped the last few T-SQL Tuesday posts, I saw the topic Wendy suggested for this month. That’s an interesting one, so I’m kind of back again…

One particular sentences that stuck with me, was “Let’s make these stories about the tech and how that has led you down a certain path”. A few months ago, I blogged about leaving my former employer. There I got a change to discover a bunch of new technologies and discover what I really wanted. But the path started earlier.

During my first internship, I started working as a desktop support engineer and help desk employee. This meant I needed to install, replace and fix PC’s, phones, network, etc. After the first internship (6 months), I stayed with the company as a part-time employee. Then I decided to stay there for my second internship of 6 months, which I filled working as a system administrator.

When I was still busy with exams and finishing my education, the company asked me to stay with them. They had an interesting role in mind for me: SQL Server consultant. At that time, I’ve never wrote a single query (except a few trials in my database classes at school), but I accepted the offer. The learning curve was pretty steep, but I managed with the help of my wonderful colleagues. My first experiences with SQL Server were very interesting, and it really triggered me to learn more about it, even though it was SQL Server 2000 back then.

After 2 years I got the opportunity to start as a .NET developer within the same company. Because several colleagues left, there was a high demand for software engineers. So with no knowledge of ASP and VB, I started to discover this new world. After a few months, I moved from ASP to ASP.NET, from VB to C#, and eventually worked with WCF, Silverlight, NHibernate, etc.

After 2.5 years of being a software engineer, I came back to the database world. I can’t explain why, but I missed something in my daily job that I had before. When searching for a new job, I found a company that seemed to fit my ideas of a great company and a huge challenge: multinational, young team, great colleagues, lots of learning opportunities, etc. There I started out as a BI- and SQL Server developer, and I found what I was missing in my previous job: data!

One thing I’ve learned after this wide variety of jobs, is that I love databases and data. In my current job of DBA I can combine the 2 things I love the most: data and technology. As a DBA I’m not only responsible for keeping the databases alive, but also for data quality, database performance, etc.

So this is the ultimate job for me, at least at the moment. But if I ever change jobs, I’m definitely not leaving the SQL Server world! Not only because I love working with databases and data, but also because of the wonderful community. At first I didn’t know what to think about the “SQL Family”, but now I know it practically IS a family. The people have the same issues as you, the same interests, and they love to help you out. So I’m never going to leave that behind again!

Without pain, without sacrifice, we would have nothing…

The last few days flew by for me. After 2.5 years of working for my previous employer, I decided it was time for a new challenge. In the past period I’ve learned a lot, and had the opportunity to learn and research a lot of new things and techniques.

In 2010 I started working for the company, and began as a Data Warehouse (DWH) developer. Before my first day on the job, I had pretty much no experience with a DWH. That was quite a challenge, especially because there was no documentation at all. So I needed to find out everything myself, because there wasn’t a real DWH developer before I started there.

This also gave me a chance to dive into Reporting solutions. The company ran a Reporting Services machine with tons of Management-, Marketing-, Sales- and Development-reports. These reports varied from showing turnover information, to showing the IT department the (partial) morning check.

Eventually I ended up building reports, cubes, and developing and maintaining the DWH. This was my daily job for almost a year, next to helping out my team, consisting of about 10 to 15 .NET developers.

After about a year, management decided to let an external company develop our DWH. That gave me more time to focus on building reports, and doing more development work for our test- and production environments. That year I started with spatial data in SQL Server. This was one of the coolest things I got go do in the 2.5 years I spent there.

The hardest thing to give up wasn’t the company or the job, but it was the team. It was a pretty young team of developers, and most of the guys (and the one gal we had) were pretty awesome. There was always someone who could help you out, and the team members had no problems with working late to help a colleague out. The team was amazing!

But like I said, after more then 2 years, it was time for something new. While reading a blog post by SQLRockstar (Blog | @SQLRockstar), I read a beautiful quote that convinced me I made the right decision:

“Without pain, without sacrifice, we would have nothing”

This is a quote from the movie Fight Club. It tells us that nothing comes without sacrifice. You can’t achieve anything without taking risks, and sticking out your neck. This is what I’m about to do…

I can’t tell you yet what my next challenge will be, but I’m very excited to start my new job. I’m hoping to write a blog post next week, telling you more about that… :)

Enriching your dataset – What are your technical options?

In my previous post I tried to explain how to enrich your dataset, so you can start using spatial data. You’ve seen that there are free options, or you can buy a bunch of files with the information you need.

This time I’ll try to explain the technical option you have. The options all contain the Google Geocoding API, because this is free, and it’s fairly accurate. But this will also work for the files you buy from (for example) GfK GeoMarketing, which I did business with once.

Write your own application
One of the easiest an quickest options for me was to write an application to fetch my data from the Google API. I’m not a die-hard programmer, but I know my way around C# as far as I need to. So with a little help from my colleagues, I managed to write a Windows Forms application that calls the Geocoding API. The results of the API call are either just shown on screen (in a GridView), or exported to Excel (using LinqToExcel and ExcelExporter).

Another option I intend to build in, is the export from and to a SQL Server database. But because this is a project I work on in my own time, this could take a couple of weeks.

CLR
One of the other options I’ve found, is a CLR that calls the API. The CLR you write, is basically a .NET application that you load into SQL Server. It’s more complicated than that, but to keep this story moving on, I’ll leave it at that.

One of the many downsides of using a CLR, is your local DBA. Every DBA I’ve encountered in my life told me: “Don’t use a CLR, EVER!!! Because that’s the biggest security hole you can open up on a SQL Server”. To be completely honest, I’m not that familiar with CLR’s, but I guess it isn’t as easy as that. There might be some pros and cons regarding that…

SSIS Package
In my quest to find all possible options, I found this great article by Donabel Santos (Blog | @sqlbelle). In her article she describes how you can create an SSIS package that fetches a Lat/Long for every record in the dataset you use as input.

Talking about this with Koen Verbeeck (Blog | @Ko_Ver) on Twitter made me realize that for normal businesses, this might be the best option. In every company I know, there’s always a server running SQL Server Integration Services that you can use to execute such a package.

So, what’s your choice?
Looking at the options above, I’m guessing the option you choose depends on the company you work for. I guess that smaller companies would choose the SSIS package over building a tool. But if your company has a development department with a bunch of software developers, writing your own tool might be a better option. In that case writing your own tool gives you more flexibility, because not only your DBA can debug an error in an SSIS package, but every developer can debug the tool they’ve written.

If you’ve got some .NET/C# knowledge, and you have the time to dive into CLR’s, that might be your best option. CLR’s are loaded into SQL Server, and are (as far as I can tell) blazing fast. So if you’re looking for performance, and don’t want external tools or packages to fetch your data, go for it!

T-SQL Tuesday #37 – Join me in a Month of Joins

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Join me in a month of joins”. If you want to read the opening post, please click the image below to go to the party-starter: Sebastian Meine (Blog | @sqlity).



When I read this months invitation by Sebastian, I thought about a situation last week. A developer asked me to review a bunch of T-SQL queries and procedures that one of the other colleagues wrote, because they didn’t work. The first thing that I noticed was the readability of the scripts. I’ve seen some bad ones in my life, but these were just monstrous!

Thinking about the situation I’m guessing it’s just a lack of knowledge about databases. They don’t see what the connection is between data sets. They don’t know what specific joins do, and so they used what they see everyone uses: just JOIN. But they don’t realize that this implicitly means INNER JOIN for the engine.

One of the biggest issues in the script was the fact that a MERGE was used instead of an UPDATE FROM. I’ll try to explain this by using an example of a car factory. The code is exactly the same as the code I was asked to debug, except the objects are renamed.

MERGE INTO Factory.dbo.Stock
USING #TMP_NewDelivery D
	ON D.Brand = Factory.dbo.Stock.Brand
WHEN MATCHED
	AND D.PartID = Factory.dbo.Stock.PartID
	THEN
		UPDATE SET DeliveredAmount = D.DeliveredAmount
OUTPUT
	deleted.PartID
	$action,
	GETDATE(),
	inserted.PartID
INTO @Logging

One of the issues with this code that I noticed first was the fact that it only updates values. So why didn’t they use an UPDATE FROM? This isn’t too bad, except the JOIN clause isn’t declared once, but twice: in the USING, and in the WHEN MATCHED part. The issue is that the ON clause is joined on Brand (let’s say Seat), and that the PartID is added at a later stage. In the end, the query started updating all rows with the same PartID (let’s say Engine). So instead of updating the stock of Seat Engines, it updated the stock for all Engine parts.

And unfortunately I couldn’t do anything with the logging data that is generated by the script. Instead of storing it in a table, the logging information wasn’t used at all. It was stored in a memory table (why they used a memory table, I don’t know), and this wasn’t saved into another object. So why use precious CPU cycles to “store” information you don’t use?

Looking at this reminded me of something I tend to forget: the database is often some side-track for developers. They quickly write a query that isn’t that good and isn’t that fast, but it does the trick. At least, that’s what they think!

Please start thinking in collections and sets if you work with SQL Server, and don’t use a cursor for everything. If you don’t know the difference between a LEFT, RIGHT and INNER JOIN , please ask for help. Your DBA or SQL developer won’t make fun of you for asking. And if you don’t ask for help, please don’t be mad if we use your code as an example! ;)

Are nested Inserts possible?

Two weeks ago I got an interesting questions. Two of my colleagues thought of a funny way to tease me, by asking me this question:

If we have an insert into table X, can we nest another insert into the query to fill table Y with the identity generated by insert X?

After thinking about this a few minutes, I told them I thought this was possible. Eventually I found a few spare minutes, and came up with this solution. In all honesty, this isn’t completely my solution. A few forums I came across mentioned this option, and I took those ideas and created my own solution.

So first, let’s create the 2 tables we need for this example:

DECLARE @T1 TABLE
	(ID INT IDENTITY(1,1),
	 String VARCHAR(100))

DECLARE @T2 TABLE
	(ID_REF INT,
	 String VARCHAR(100),
	 Action VARCHAR(100))

So I’ve created T1 (table X in the question), and T2 (table Y). Let’s assume that T1 contains your product data, and for every product you insert, you want to insert a log-record into T2.

This can’t be done with actual nested inserts. If you want to do this, the easiest ways is to create a Stored Procedure (SP) that uses SCOPE_IDENTITY to retrieve the generated ID, and insert that into your second table. But because an SP isn’t always the best ways to do this, I wanted to see if I could transform it into a single query.

Since SQL Server 2008 we have the opportunity to use the MERGE statement. What the MERGE does is synchronize two tables by inserting, updating and deleting rows in the destination table, based on a set of rules and/or joins.

So, how would this statement look like? After some trial and error, I created this statement:

MERGE INTO @T1 T1
USING (SELECT -1 AS ID, 'This is a string' AS String) Q
	ON Q.ID = T1.ID
WHEN NOT MATCHED BY TARGET
	THEN INSERT (String) VALUES ('This is a string...')
OUTPUT
	inserted.ID, inserted.String, CONCAT('<<<<< Inserted in T1 @ ', CONVERT(DATE, GETDATE()) ,' >>>>>')
INTO @T2;

As you can see the destination table is T1. This is the first table the record is inserted into, and the table that generates the identity. Because we only want to insert the record, and not update or delete anything, I only created a “WHEN NOT MATCHED BY TARGET” clause.

Because the ID columns don’t match, the record gets inserted into T1. After that, I use the OUTPUT clause of the merge statement to insert the same record (but with the generated identity) into T2. As a reference, I also insert the action-description that contains a date.

So as you can see, you can use nested inserts in SQL Server, only via another statement. But remember, this is just written to prove it’s possible in a single statement, and not for production usage. So if you decide to use this in production or on your system, consider your other options first!


Code samples:
Are nested Inserts possible.sql

T-SQL Tuesday #35 – Soylent Green

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Soylent Green”. If you want to read the opening post, please click the image below to go to the party-starter: Nick Haslam (Blog | @nhaslam).



The question of this month was to write down our most horrifying discovery from our work with SQL Server. If you work with SQL long enough, you will encounter some strange situations. One way or the other…

My first experience with SQL Server was at my first official IT job. Back then I worked for an insurance company, and there I was offered the opportunity to become a “Conversion Specialist”. This meant that I visited customers, advised them about their data and our software, and converted and import their data into our main system. When I started the job, I’d never wrote a single T-SQL statement. So the learning curve was pretty steep, but after some sleepless nights of studying, I got the hang of it. And during this job, I encountered my first (of many) horrifying experiences…

In this company, the main application took it’s data from one (!!!) SQL 2000 database. The system contained all the data the company had, and it was a “rebuild” DOS application in Delphi (at that time the company worked with Delphi and .NET). In order to store all of their data into one data model (yeah, I know!), they created a “flexible and dynamic” model… In one table…

The data was stored in one table (it wants to remain anonymous, so from now on we’ll call him “Foo”), and Foo contained more then 200 columns (as I recall correctly). Every time we inserted a record into Foo, SQL Server calmly mentioned that the limit of 8000 bytes (max. per record) was exceeded. How they fixed that? I still haven’t got a clue…

Every “object” stored in Foo contained properties, or a collection of properties (which obviously ended up in 1 record for each item in the collection). But as I mentioned, they were stored “dynamically”. So if you wanted to retrieve an object “Tree”, then you needed columns 15, 18, 20 and 52. When retrieving an object “Bird”, you needed columns 15, 18, 25 and 2550 for the same properties.

But I must honestly admit: I left the company after six years with tears in my eyes. They offered me a lot of opportunities, and the colleagues were awesome!

Another example I encountered on a production environment (at a different company), was an issue with currency calculations. The product data and the currency rates were loaded from 2 different data sources. To combine these in one record (and calculate the turnover), they used a CASE statement in the script, that ran in the Data Warehouse. But when I took over the Data Warehouse, they forgot to mention one thing…

If a product was sold for 100 Mexican pesos (with current exchange rates this is about € 6.00 or $ 7.80), and no exchange rate from pesos to dollar was present, the script ended up in the ELSE clause. This clause multiplied the amount with 1, “not to mess up the data”. And without anyone noticing, 100 Mexican pesos turned into $ 100! It actually took a while for people to notice this (including me!).

And I’m probably not the only one with these experiences, so I’m glad Nick asked us to share them with the rest of the #SQLFamily! And if you want to read more of the horrifying experiences, let me know. I might write another blog post about this, because this feels like therapy! And it’s another change for me to make Rob Volk (a great guy, and great example to me!) proud, by using the “Evil” tag again! ;)

Follow

Get every new post delivered to your Inbox.

Join 34 other followers