Become a T-SQL Hero with SQL Prompt

Since 1999, Red Gate Software has produced ingeniously simple and effective tools for over 500,000 technology professionals worldwide. From their HQ in Cambridge UK, they create a number of great tools for MS SQL Server, .NET, and Oracle. The philosophy of Red Gate is to design highly usable, reliable tools that solve the problems of DBAs and developers.

Every year Red Gate selects a number of active and influential community members (such as popular blog writers and community site owners) as well as SQL and .NET MVPs who are experts in their respective fields, to be part of the Friends of Red Gate (FORG) program. I’m proud to announce that I’m part of the 2014 FORG selection. This post is a part of a series of post, in which I try to explain and show you why the tools of Red Gate are so loved by the community.



 
What SSMS misses
The tool that Microsoft provides you with when you install SQL Server is pretty nice. It’s nicely designed (even though I’ve heard other opinions), it’s stable, and it does what it should do: it allows you to administer your servers. But that’s not the only thing that it should do in my opinion. If you take a look at Visual Studio as an example, that studio contains more options that helps developers do their job. And remember, SQL Server Management Studio (SSMS) is actually a Visual Studio instance with a different layout (just check the Ssms.exe.config)…

So why doesn’t SSMS have a schema compare option, like Visual Studio has? Visual Studio is no longer the environment that is used only by developers that work with ASP.NET and C#, but it evolved to much more the last few years. It’s now the tool for working with Data Quality Services (DQS) and SQL Server Integration Services (SSIS). So let’s talk about some other features that SSMS misses in my opinion, and let’s see how SQL Prompt can fill that gap.

 
IntelliSense
SSMS ships with a default intelliSense, but this isn’t an implementation that I would like to see. It misses a few vital features. For example, the fact that SSMS IntelliSense doesn’t take relations between objects into account, is one of the biggest shortcomings. One of the companies that created a tool to fix that is Red Gate. If you install SQL Prompt, you get IntelliSense 2.0, or IntelliSense on steroids if you like.

When you installed SQL Prompt, it gives you suggestions when you write a JOIN clause. This means that it scans column names, and traces primary- and foreign key relationships on the tables you are joining. The join suggestion based on keys can be recognized by the little key symbol in front of it:

 
Object discovery
Whenever you’re working in a database, and you’re writing your queries, there comes a point that you can’t remember a column name or datatype. In SSMS you need to navigate the object explorer to the object (let’s say a table), and generate a create script, or click on the table to get to the column list. SQL Prompt allows you to hover your mouse over an object, and see some extra information:

 
If you click on the popup, you’ll get another popup window with the creation script (by default), or a summary of the object:

 
Scripting options
Whenever you need to script an object, or want to see the contents of for example a Stored Procedure, you need to navigate to the object in your object explorer. With SQL Prompt, you can also use the mouse context menu to script objects. Just right-click an object you referenced in your query, and choose the “Script Object as ALTER” option:

 
This will generate an alter script for the object you selected. This makes it a lot easier to see the contents of a Stored Procedure or View, and change it when needed.

 
Useful functions
The last feature I want to show you is the menu of SQL Prompt. This shows you another set of useful tools and functions. For example, how do you format your T-SQL query? SQL Prompt can do that for you with a few mouse clicks, or if you press the hotkey combination. Another great feature is the “Find Unused Variables and Parameters”. This saves you time when you try to find out which declared variables you don’t use anymore, in a very large query. All of these options can be found in the SQL Prompt menu:

 
If you want, you can also create a style-export for all your colleagues, so your entire department or company formats queries according to the same layout. You can find out more about this in the SQL Prompt menu, under Options -> Format -> Styles. You can export your formatting options as a .sqlpromptstyle file, or import one.

 
Is it worth it?
If you would ask me, my answer would be: yes! Even though it’ll cost you about €285,- (or $390,-), it’s definitely worth it. It saves you a lot of time, and it adds a lot of useful (and needed) features to SSMS.

If you want to try it out, just go to Red-Gate.com, or the product site for SQL Prompt. You can download a trial there that contains all features, for a limited time.

 
If you want to read more about this topic, don’t forget to check out these blog posts:

- Julie Koesmarno: Clean And Tidy SQL With SQL Prompt
Mickey Stuewe: Becoming a SQL Prompt Power User
Chris Yates: SQL Prompt – The Power Within

T-SQL Tuesday #56 – Assumptions

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Assumptions “. If you want to read the opening post, please click the image below to go to the party-starter: Dev Nambi (Blog | @DevNambi).



 
This months topic is about assumptions. A few years back, I worked in a team that consisted of mainly .NET developers. Every time we mentioned “I think so…”, “I assume it works like this…” or “I think we should…”, one of them used the quote: “Assumption is the mother of all f*ck ups!”, which is a quote from the movie Under Siege 2: Dark Territory. But he was right. The moment you assume something, it’s going to blow up in your face in the end.

 
I tested it, and it works
Working for larger organizations should mean they are more prepared to certain things then smaller organizations. But I’ve seen large organization being badly prepared, or just plain unprepared. They assume their processes work, or they will never encounter failure at all.

One of the companies I worked for, took backups every night. Full backups. Of databases there were between 100 GB and 500+ GB… And they never tested a restore… Why? They used the default maintenance functionality with “Verify Backup Integrity” enabled, and they never needed to restore a database before. So they only took backups because management wanted that. They didn’t understand why, because their processes never failed, and would never fail in the future.

But one day, it wasn’t their processes that failed. A LUN went offline during the ETL process, and SQL Server naturally detected that. SQL Server put a database into suspect mode, because of database corruption. But because there were no backups, they needed to move to plan B: process about 2+ years of data (stored in XML files) again.

Eventually I solved it and recovered the database without the need of a restore, but it scared them. They now saw why they needed backups, and why they needed to test the restore on a regular basis. But they forgot about it after a few days, and we never got the time to change the maintenance processes or test any restore. After that, I made the best choice possible in my opinion: I found myself a new challenge.

 
If you don’t know what you’re talking about…
Another example of assumption I have seen a lot over the years, is people explaining stuff to other people, without any proper knowledge about a certain subject. I can recall a conversation between me and an intern. He was a .NET developer, and had some questions about how a T-SQL feature worked. Another junior BI developer started laughing, when the intern asked his question. “What a stupid question, everybody knows the answer to that!” he said. Kind of irritated by that, I asked him to provide the answer to that question. He didn’t want to. I asked him again: “you answer the question, because you laughed about it, and I want to hear the answer from you.”

He started to stutter, and he explained the functionality all wrong. When I explained it the right way, the .NET intern thanked me, and walked away with his new knowledge, ready to bring it into practice. The BI developer wanted to continue the discussion. “You’re all wrong! That feature doesn’t work that way!”. I nicely told him, that I used this feature on a daily basis, and that he was wrong. The discussion went on a little more, but I stopped the discussion by telling him: “I’m doing this for a number of years now, and working with SQL Server is what I do. You just started, and wrote your first query a few months ago. If you find any resources that show me being wrong, I’ll be happy to quit my job. Until then, please don’t explain T-SQL to other people if you don’t know what you’re talking about.”. Until this day, he never got back to me on this discussion.

 
I don’t need to check that!
Another great assumption you’ll see in several companies, are IT people that trust their own automation a little bit too much. The rule in IT is that if you need to do something more than once or a couple of times, you need to automate it. Automation is a good thing, and it can save you time. But who checks if your process doesn’t fail? You don’t want to build a system, that checks another system for you. One of the things I’ve seen is a developer that created an automated process, that checked a log table on a database server, and mailed new errors to the developer. Looking at this, it’s a perfect solution. You don’t have to monitor the log table by yourself, but an automated process does that for you.

At one moment, an application seemed to fail. It threw exceptions, and the end-users weren’t able to do anything with the application. The developer was called, and he told the users to contact the system administrator, because it must have been a server- or hardware problem. The system administrator called the developer after a few minutes, and told him the server and hardware were in perfect condition. The developer insisted his software wasn’t failing, because he didn’t receive any errors by email. But after a quick check, the developer came to the conclusion his automated process failed. The developer lost a lot of credits because of this attitude. As you see, this is another example of an assumption that went wrong.

 
Never stop asking questions
One of the most important things I wanted to show you with this blog post, is that if you don’t know the answer to a question, don’t be afraid to ask someone. The same goes for processes, tools, functionality, or any other question you want to ask. If people mock you for asking questions, they are the ones that are wrong. You’re just trying to learn and grow, so don’t feel bad about yourself!

Using FOR XML to create arrays, JSON and JavaScript

Every new project brings its own challenges and pitfalls. New projects also give you a chance to be creative in finding solutions, and try new things. This keeps it interesting, and allows you to learn new things. And sometimes, the things you avoided for all these years, turn out to be your best option.

 
A missing feature in SQL Server
For a project, we needed to generate JSON or JavaScript from T-SQL. Even though the whole world seems to use JSON at this point, there is no feature to generate JSON from T-SQL or SQL Server. There are more than enough people who would like such a feature (just look at the Connect item made for this). Maybe they’ll put it in the vNext of SQL Server, who knows. But at this moment, we need to generate the JSON ourselves.

 
Generating the data array
First of all, we need to generate an array or list. The problem with this, is the number of rows. For example, if you only have 2 rows per product, you can join the table onto itself, and concatenate the columns like that. But what if you have more rows per product, or you don’t know how many rows per product are returned?

One of the ways to resolve this, is by using the FOR XML clause. You can use this in a CROSS APPLY, to create an array/list of data, grouped by another column.

So let’s create an example, by using airlines. When you fly to another state, country or continent, you’ll book a flight with an airline. Those airlines have a name, a departure airport code, and of course a ticket price. Let’s put this data in a table:

CREATE TABLE #TMP_AirlineTickets
  (ID INT IDENTITY(1,1),
   Airline VARCHAR(10),
   Departure_Airport_Code VARCHAR(5),
   Price DECIMAL(10,4))


INSERT INTO #TMP_AirlineTickets
  (Airline, Departure_Airport_Code, Price)
VALUES
  ('BA', 'RTM', 10.00),
  ('KLM', 'AMS', 125.00),
  ('BA', 'LHR', 15.00),
  ('KLM', 'BCN', 50.00),
  ('KLM', 'BHX', 75.00)

 
When we query this data, we could easily use a sub-select to query. But because I want my query to be readability, and good performing, I used the FOR XML clause:

SELECT
  AT.Airline,
  STUFF((SELECT
            CONCAT(',', SUB.Departure_Airport_Code)
         FROM #TMP_AirlineTickets SUB
         WHERE SUB.Airline = AT.Airline
         ORDER BY SUB.Departure_Airport_Code ASC
         FOR XML PATH('')
        ), 1, 1, '') AS Array_Airport_Codes
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
What happens in this query? The FOR XML clause creates an XML for the airline that is selected in the outer query. So a query like this is executed:

SELECT
  SUB.Departure_Airport_Code
FROM #TMP_AirlineTickets SUB
WHERE SUB.Airline = 'KLM'
ORDER BY SUB.Departure_Airport_Code ASC
FOR XML PATH('')

 
The result from that query is an XML column:

 
By concatenating a character to that XML column, the XML nodes are removed. This gives us the possibility to use this as an array:

SELECT
  CONCAT(',', SUB.Departure_Airport_Code)
FROM #TMP_AirlineTickets SUB
WHERE SUB.Airline = 'KLM'
ORDER BY SUB.Departure_Airport_Code ASC
FOR XML PATH('')

 
The result of this statement looks like this:

 
The next step is to use this in a query that creates an XML column for all airlines in the table:

SELECT
  AT.Airline,
  (SELECT
            CONCAT(',', SUB.Departure_Airport_Code)
         FROM #TMP_AirlineTickets SUB
         WHERE SUB.Airline = AT.Airline
         ORDER BY SUB.Departure_Airport_Code ASC
         FOR XML PATH('')
        ) AS Array_Airport_Codes
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
The result is an array of airport departure codes:

 
As you can see, this creates the array/list for all airlines in the table. With STUFF I replace the first character, because I don’t want my result to start with a comma. This can be done with a RIGHT clause, that USES LEN – 1 to strip of the first character on the left, but this looks better in my opinion:

SELECT
  AT.Airline,
  STUFF((SELECT
            CONCAT(',', SUB.Departure_Airport_Code)
         FROM #TMP_AirlineTickets SUB
         WHERE SUB.Airline = AT.Airline
         ORDER BY SUB.Departure_Airport_Code ASC
         FOR XML PATH('')
        ), 1, 1, '') AS Array_Airport_Codes
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
The result looks the same, except the first comma is gone:

 
One of the best things when creating demos like this for a blog post, is that you can make it as unrealistic as you want. What I mean by that, is I’m making it easy to generate XML, in order to keep the examples clear and understandable. But what if you want to use this in the real world? You’re probably seeing datasets that aren’t as easy to use as the demo above. The strings you want to concatenate are filled with special characters, etc. Luckily there’s a solution for that. Rob Farley (Blog | @rob_farley) wrote a really nice blog post about that.

 
So now we have our array/list ready, but how do we convert this to JSON or JavaScript? The main difference (sorry if I insult my developer friends with this statement), is that in JavaScript everything needs to be enclosed in double quotes (“). This isn’t needed if you create JSON. In the following example, I’ll create a JSON object. But because just airport codes isn’t what we need, I’ll adjust the query so it’ll add the prices to the object as well. This is done by an additional CONCAT in the XML select, which concatenates the airport code and the price into 1 string:

SELECT
  AT.Airline,
  CONCAT('{"ap":[',
         STUFF(
               (SELECT
                    CONCAT(',{"', SUB.Departure_Airport_Code, '":', SUB.Price, '}')
                FROM #TMP_AirlineTickets SUB
                WHERE SUB.Airline = AT.Airline
                ORDER BY SUB.Departure_Airport_Code ASC
                FOR XML PATH('')
                ), 1, 1, '')
         , ']}') AS JSON
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
The GROUP BY can be replaced by a DISTINCT. But either one of those is needed to make sure you’re only returning unique records:

 

 
By parsing the second result in the query in an online tool, you can see that the result is valid JSON:

 

 
By adding some double quotes in the script above, you can easily generate JavaScript from T-SQL. But if you just want to use the FOR XML to generate arrays, this will also work perfectly.

 
Why would you want to use this?
Generating JSON or JavaScript might come in handy for webdevelopers, that can get data from a database, and use that directly on their website. By using the FOR XML clause to generate JSON, you make sure your queries are flexible enough to generate valid JSON, even if your dataset grows. Generating arrays like this can also come in handy for developers that need data from their databases. Instead of creating an array in code, they can now use the array that is returned from the database.

So even though I tried to work my way around using FOR XML, it turned out to be the best option thinkable!

Write readable and high-performance queries with Window Functions

In SQL Server 2008, we gained a new and powerful feature in our bag of T-SQL tricks: Window Functions. The actual Window Function is the OVER clause, that allows you to determine partitions or “groups” in your query, before applying another function. In practice, this means you can create groups in your dataset, that can be molded further by applying functions to that groups.

In the past, I’ve written a number of posts about Window Functions in SQL Server:

Row_Number: Unique ID in select statement
Calculating Running Totals
Removing duplicates in a dataset

But there’s more!

 
Finding missing numbers in sequence
How many times did you look at a table, and you noticed that one of the values in a sequence is missing? Or a date range in a table, where someone deleted a few records. So all of a sudden, your year doesn’t have 365 days (or 366 in a leap year, or 365.242199 days which is the years average), but 315 days. There go your management reports that are calculated on a per day average. So how do you find the missing dates, without having to write a cursor or create a full blown date table like your BI colleagues do? You guessed right: a query with Window Function!

Let’s start with declaring a table, and insert a sequence with missing values:

DECLARE @Sequence TABLE
  (Number INT)


INSERT INTO @Sequence
  (Number)
VALUES
  (1),
  (10),
  (7),
  (4),
  (2),
  (8),
  (5)

 
So how can we get the “nearest numbers” from that table with just a single select statement?

SELECT
  LAG(Number, 1, 0) OVER (ORDER BY Number) AS LAG_Value,
  Number,
  LEAD(Number, 1, 0) OVER (ORDER BY Number) AS LEAD_Value
FROM @Sequence AS S
ORDER BY Number ASC

 
The LAG and LEAD are standard t-sql functions from sql server 2012 on. These functions give you the opportunity to access the previous or next row, without the need for a so called “self-join”. So what you see is the number, the record preceding that value (LAG) and the following value. So in this case, number 2 is preceded by Number 1, and followed by Number 4.

 
The lemonade stand
Now let’s look at another example. How about you? When you grew up, you wanted to save money for a new mobile phone right? In my case it was either a disc man, a Walkman, or a stereo set. But let’s stick with the modern equivalent of the Walkman for now: the MP3 player. So to earn money for the MP3 player, our fictitious friend who is called Joe, decides to start a lemonade stand. He needs to save up at least $150 to buy a new MP3 player. So every glass of lemonade he sells is accounted for, and at the end of the day he sums up all his
earnings, and puts it into a table:

DECLARE @Profit TABLE
  (DayNumber INT,
   Sales DECIMAL(10,2))


INSERT INTO @Profit
  (DayNumber, Sales)
VALUES
  (1,  6.90),
  (2,  4.17),
  (3,  2.69),
  (4,  7.26),
  (5,  2.93),
  (6,  8.98),
  (7,  7.25),
  (8,  5.88),
  (9,  1.51),
  (10, 7.97),
  (11, 3.44),
  (12, 3.76),
  (13, 9.96),
  (14, 0.92),
  (15, 8.28),
  (16, 6.05),
  (17, 9.40),
  (18, 4.03),
  (19, 9.14),
  (20, 7.25),
  (21, 0.06),
  (22, 9.12),
  (23, 7.39),
  (24, 6.57),
  (25, 4.54),
  (26, 0.09),
  (27, 4.42),
  (28, 9.53),
  (29, 5.09),
  (30, 0.89)

 
So as you can see, he earns quite a lot of money this way! But because he’s eager to buy his new MP3 player, he wants to see his day totals, and the amount he needs to buy his new toy. But because Joe is a really smart guy, he doesn’t want to do this with a lot of self-joins, and he wants his results fast. So looking at performance, what is the easiest way to query this data? How about this:

DECLARE @Goal DECIMAL(10,2) = 150.00


SELECT
  DayNumber,
  Sales,
  @Goal - SUM(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
He declares a “goal” variable, that is set to the amount he needs for his new gadget. So for every row, we calculate the sum of that day, plus all the sales he made from the first day (UNBOUNDED PRECEDING) and today (CURRENT ROW). After day 28 he has earned enough to buy his MP3 player. But now he wants to know what his average sales were. So he calculates the average of his sales, based on every sale he’s made so far:

SELECT
  DayNumber,
  Sales,
  AVG(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
So where does it stop?
So now that we’ve seen the SUM and AVERAGE option, what do we have left? How far can we take this? Thinking about it, how about a daily checkup if we hit a lowest or highest Sales amount? We can do this with the MIN and MAX option on the same query:

SELECT
  DayNumber,
  Sales,
  MIN(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS LowestSale
FROM @Profit


SELECT
  DayNumber,
  Sales,
  MAX(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS HighestSale
FROM @Profit

 
Now Joe can save his daily sales in the evening, and check if he had a very good, or a very bad day.

 
Not only for the average Joe…
So how can the business profit from all this? In some cases, Window Functions make it easier to output a certain resultset. In some cases it even gives you a whole new way to output data with a well performing (single) query, that was impossible until now. So if you’re running SQL Server 2008 or higher, start using (or at least start exploring) Window Functions right away!

 

To generate the random floats, I’ve used the generator of FYIcenter.com

If you want to read more about this topic, don’t forget to check out these blog posts:

- Julie Koesmarno: ABC Classification With SQL Server Window Function
Mickey Stuewe: A Date At The End of The Month
Chris Yates: Windows functions who knew

How SQL Search saves you time

Since 1999, Red Gate Software has produced ingeniously simple and effective tools for over 500,000 technology professionals worldwide. From their HQ in Cambridge UK, they create a number of great tools for MS SQL Server, .NET, and Oracle. The philosophy of Red Gate is to design highly usable, reliable tools that solve the problems of DBAs and developers.

Every year Red Gate selects a number of active and influential community members (such as popular blog writers and community site owners) as well as SQL and .NET MVPs who are experts in their respective fields, to be part of the Friends of Red Gate (FORG) program. I’m proud to announce that I’m part of the 2014 FORG selection. This post is a part of a series of post, in which I try to explain and show you why the tools of Red Gate are so loved by the community.



 
Red Gate? No thank you!
One of the biggest prejudices of the tools from Red Gate is: you have to sell one of your kidneys, in order to afford one of their tools. I agree with you, some of the tools Red Gate sells are pretty expensive. Especially when you need to buy them yourself. But what if I tell you they pay for themselves in the long run? You don’t believe me? Okay, let’s start of with a free tool to convince you.

 
How SQL Search can save your bacon
As a DBA and SQL Server developer, one of your biggest challenges is to memorize your whole environment, and learn every line of T-SQL there is to find from the top of your head. I’m sorry? Oh, you don’t do that? Well, that doesn’t really come as a surprise. And if you thought: “Hey! I’m doing that too!”, stop wasting your time! No DBA, wherever you will look in the world, will EVER remember all of the T-SQL script, stored procedures, views, etc, that can be found in his or her environment. But where do they get their information from? What would you say if I told you you could search though all objects in a specific database, or on a specific instance?

So how do you do that? You’re planning on making your own script to search through the system view sys.columns for column names, and sys.procedures for Stored Procedure text? There is an easy way out, you know!

 
Installing and testing
In order to create a test case that you can repeat on your own SQL Server, I’m using the AdventureWorks2012 database. That’s a free example database that you can download from CodePlex. I’ve chosen to download the backup file, that I restored on my local SQL Server.

The installation of SQL Search can be found on the Red Gate SQL site. When you installed SQL Search, a button will be added to the SQL Server Management Studio (SSMS):

 
So, what do you actually get? If you click the button in SSMS, you’ll see a new window, that’ll act as a result pane. The top of the new screen looks like this:

 
This screen contains all possible options you will need to find objects either on your server, or in a specific database. You see a textbox on the left, where you fill in your search term. The next few options can be used as a filter. You can search on an exact match (checkbox), search on a specific object only (dropdown list), and on a specific database (dropdown list). The last dropdown list is the instance you want to search on. This list is populated with the open instance connections from your object explorer.

 
Test case
So how does this work in practice? One of your colleagues comes up to you, and asks you what objects in your database are related to department data. You could search for documentation in order to answer that question, or you could let SQL Search give you the answer. So, let’s search the AdventureWorks2012 for objects that are related to, or contain department data. The result will look like this:

 
As you will see there are 18 objects that are related to your search term (the count of object is visible at the bottom of the search results on the right). Some objects are shown multiple times, because there are multiple matches on that object. For example the “vEmployeeDepartment” view. The name of the view contains our search term, one of the columns is called department, and the text of the view (create script) contains your search string.

But how does this work with real life situations? How many times do you get the question from your colleagues how many objects are related to a specific table or column? As a DBA you probably get this question more than you would like. Your developers want to rebuild an application, or add a new feature to it, but they’re not sure if the database change they’ll make will break another applications or processes.

It’s also possible for you to use the tool to your own advantage. For example, when you want to know which object can update a specific employee record. You just search for both terms, and you’ll see that there are 3 stored procedures that update employee information:

 
Please hold on, while I search for the object…
…is a sentence you never have to use again when working with SQL Search. Whenever you found the object you were looking for, you can just double click on it, or use the button between the result pane, and the script pane:

This will look up the object in the object explorer. So you never have to look for an object after you found it. Just let SQL Search do all the hard work.

 
Red Gate? They’re AWESOME!!!
Hopefully you changed your mind about Red Gate tools, after reading this article. This is one of the tools I personally use on a daily basis. Even though there is documentation within your company, you still need to find it. It’s printed and laying around somewhere, or on an intranet or SharePoint. You know where you can find it, except when you REALLY need it!

SQL Search is also a tool you (in my opinion) really need when you’re new to a company. You’ll see a lot of different databases, with different purposes, and maybe even different DBA’s responsible for that specific database. Using SQL Search gives you a great advantage when you need to have a chat with the DBA. You’ll step into their office with a little more knowledge of the database, without reading endless documents, cryptic ERD’s and folders full of unnecessary documentation.

 
Feedback
When you start using the tool, don’t forget to thank the people from Red Gate. They LOVE to hear your feedback, either in a tweet, the Red Gate Forums, or by contacting support. You could also send me a mail or tweet, or leave a comment at the bottom of this post. I would love to answer your questions (as far as I can), or pass them on to Red Gate.

If you want to read more about SQL Search, don’t forget to check out these blog posts:

- Julie Koesmarno: SQL Tools Review: SQL Search
Mickey Stuewe: On a SQL Quest using SQL Search by Red Gate
Chris Yates: Headache + Pain Red Gates SQL Search

T-SQL Tuesday #51 – Place Your Bets

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Place Your Bets”. If you want to read the opening post, please click the image below to go to the party-starter: Jason Brimhall (Blog | @sqlrnnr).



 
When I read about this months T-SQL Tuesday topic, the first thing that came to mind was things that you know will go wrong sooner or later. When you encounter a situation like this, you immediately know this can’t last forever. You want to fix it when you see it, but there’s no money, or there’s no time at that moment. But they promise you, in a few weeks you can take all the time you need. Well, that’ll never happen. Until things go wrong, and you can clean up the mess. Sounds familiar? Yes, we’ve all seen this, or will see this sooner or later.

 
With power comes great responsibility
Just imagine this with me. One of your colleagues asks you to look at a problem he’s having with a script someone in your company wrote. You probably solved it while he was standing right next to you. He watches you solve the problem, and when it’s solved, he walks away with a thousand-yard stare in his eyes. You don’t really think about it when it happens, but it’ll come to you…

A few weeks later, it’s 10 AM and you’re still having your first coffee of the day, the same developer asks you to look at “his script”. Wait, what?! Yes, he watched you work your magic, and that funny language of “Es-Que-El” seemed easy to learn. So he bought himself a “SQL Server for dummies”, learned all he needs to know in only a weekend, and wonders why it took you so long to learn it. From now on, he can write his own scripts, so he doesn’t need you anymore. Except for this last time.

Opening the script scares you: it’s a cursor. But in your frustration and amazement you “fix” the broken script, by refactoring his select statement in the cursor. Because the cursor only collects data, you add a “TOP 10″ clause in the select statement, and run the script as test. Nice, it finishes is 25 seconds. “It will only consume 500 rows” is the last thing you heard him say. You send the guy off, so you can continue your own work.

Later in the day, it’s about 4 PM, you meet the same guy at the coffee machine. He starts a discussion about how he needs a new PC, because the script YOU wrote is slow (see where this is going…?). It’s running for about 4 hours now, while it should only collect about 500 records. I know what you think: that’s impossible. You walk with him to his desk, stop the script, and look at his code. That isn’t the query you looked at this morning. Asking your colleague about it explains it all: he “slightly refactored” the script, because he didn’t need al those weird statements to get him his results. Well, after a fiery discussion of a few minutes, you explain him the DOES need the “FETCH NEXT” in the query, because the query now ran the same statement for only the first record in the select statement you declared for your cursor.

So this funny “Es-Que-El” language, isn’t that easy to learn. A beautiful quote about that, and I’m not sure who said that, says: “T-SQL is easy to learn, but hard to master”. So putting your money on one horse, in this case buying yourself a book, isn’t a good idea.

 
Putting your money on one color
Another great example is a company that had a wonderful Business Intelligence environment. They used the whole nine yards: SQL Server, SSIS, SSAS, SSRS, etc. The downside of that you ask? It was all hosted on 1 physical machine, on a single SQL Server instance. Oh, and it was running low on disk space, and there was no room in the chassis to put in extra disks. That’s right: it was like juggling burning chainsaws with only one hand. Or an interesting challenge, if you will.

Eventually we hosted a few databases on NAS volumes. At that point, I was told the databases we moved were less important. Pro tip: never EVER trust them when they say that!!! They forgot to tell me the biggest database of the moved databases wasn’t in the backup plan (500 GB database takes a long time to backup), and the last backup was made over a year ago. Surprise, one night the network card failed for maybe only a microsecond, and SQL Server thought the LUN was offline or the disk crashed. So SQL Server said that the database was corrupt, and that the datafiles were unavailable. After a few hours, a reboot of the server fixed it, and SQL Server could see the disk volumes again. So the database was saved after all.

But you see where I’m going with this? You never know when things go wrong, and putting all your money on one color when playing roulette isn’t the best idea. If the hardware of your single server fails, you fail.

 
Next, Next, Finish?
But the biggest example I can give you of a bad placed bet, are companies that work with SQL Server, but don’t hire a DBA. Have you ever worked for a company that work with Oracle? Every single company that works with Oracle, has a dedicated Oracle DBA. But have you ever wondered why that isn’t the case when a company works with SQL Server?

Thinking about it, I guess this is because a successful SQL Server installation is only a few “Next, Next, Finish”-mouse clicks away. So if the installation is so easy, every developer or person with IT experience can administer it probably. They couldn’t be more wrong. You know that, I know that, every SQL Server professional knows that, but try to convince other people of that fact.

So the worst bet you can place, and this is how I write myself back to the subject of this month, is not hiring a professional to manage your data and data stores. You wouldn’t let your local baker fix your car, because the wrote some books about cars, right? So why do you let a developer with basic knowledge near your SQL Server? Just because real DBA’s cost money? Yes, we do cost some serious money. But in the end, at least when you hire a GOOD DBA, they will make you money. You don’t think so? What does a DBA cost per hour? And how much money do you lose when your servers are down for just an hour?

SSIS – Remove empty rows from Excel import

From the first time that I started SSIS, I started to love it. In most cases it’s easy to create a package, easy to understand, and even readable for people who don’t “speak fluent SQL”. But what if you want to perform an easy task, and the result isn’t what you expect?

 
Formatting an Excel sheet
One of the most basic tasks you can create in SSIS, is importing an Excel sheet. Most of the time this works like a charm. But in my case, I wanted to filter out some rows from the workbook.

The business delivers an Excel sheet, that needs to be imported into the database. But because they don’t have the technical knowledge we have, they don’t know how important the format of the file is. They sent us this file (I’ve created a smaller sample, so it’s easier to read and understand):

 
The first thing you’ll notice as a data-professional is the 2 empty rows in the sheet. Beside that, we have an employee without a name. You know this is going to cause problems when you see it. These errors are easy to spot in the example, but imagine if these 2 rows are hidden in a dataset with 50.000 or more rows. So even though they might ended up there accidentally, your process is going to fail.

 
When you add an “Excel Source” to your package, and you look at the preview of that import, you immediately see the problem:

 
Table structure
In order to determine what columns can be left blank, and what columns can’t be NULL, I looked at the table structure:

CREATE TABLE ResultSSIS
  (ID INT IDENTITY(1, 1),
   FullName VARCHAR(50) NOT NULL,
   Department VARCHAR(50) NULL,
   EmployeeNumber INT NOT NULL)

 
So in the dataset, FullName and EmpolyeeNumber are mandatory, and Department is optional. With this in mind, I started to work on a way to exclude those rows.

 
Import without excluding
The first thing I tried is to import the file, and see what the results are. Because I knew the data wasn’t correct, I didn’t want to import the Excel sheet into a SQL Server database just yet. So as a destination, I used the “recordset destination” control in SSIS. Importing the data into this memory table also allowed me to use the “data viewer” to see the imported data, without the need to truncate a table after each run. You can enable the “data viewer” by right-clicking the import-connector (the arrow between controls), and click “Enable Data Viewer”:

 
If you run the SSIS package in debugging mode, you’ll see the data that is imported in a pop-up window:

 
As you can see in the screenshot above, the records with NULL values in it are included in this import. So which records do we want to exclude, based on our table structure?

 
So from the 6 records in the Excel sheet, we want to exclude 3 in our import because of NULL values. But how do we do that? The easiest way to solve it, is to import it into a temp table, delete the NULL records, and insert the other records in the destination table. But what if that isn’t possible, and you want to filter the records in your import? I’ve chose to use the “Conditional Split”.

 
Conditional Split
You don’t have to rebuild your whole package, when you want to exclude records with the “Conditional Split”. You can just add this control, at least in this case, in between you source file and your destination. If you open the control, you can add an expression that is used to filter records. In my case, I wanted to exclude the rows with an empty “FullName” and “EmployeeNumber”:

 
When connecting your “Conditional Split” to your destination, SSIS will ask you what output the “Conditional Split” needs to return. To output the entire set without the empty rows, chose the “Conditional Split Default Output”:

 
When you run your package with the extra “Conditional Split” (and you enable Data Viewer again), you’ll see the filtered output of the “Conditional Split”. The 3 NULL records are excluded like expected:

 
Conclusion
SSIS is easy to use, and yet a a really powerful tool. Even if you build your processes in SSIS, it’s not always necessary to rebuild your whole package. Sometimes you can save the day with just a minor change. That’s the power of SSIS!

T-SQL Tuesday #49 – Wait for it…

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Wait for it…”. If you want to read the opening post, please click the image below to go to the party-starter: Robert Davis (Blog | @SQLSoldier).



 
Explaining developers how SQL Server works is something we all do. Maybe not on a daily basis, but you’re asked questions like “why is my index not working”, or “what’s the best way to add multiple columns to a table”. And most of the time, these questions lead to whole other bunch of questions you need to answer. And the one question we all are asked more than once: “why is my query running slow?”. So where do you start explaining?

 
Wait Types
There are lots and lots of wait types that can be found in SQL Server. In SQL Server 2005 there are 230 different wait types, 475 in SQL Server 2008 and 491 in SQL Server 2008 R2. In SQL Server 2012 they added another 197 new ones to the list. The wait types can be found by running this query:

SELECT wait_type
FROM sys.dm_os_wait_stats
ORDER BY wait_type ASC

 
These wait types can tell you what SQL Server is doing to execute your statement, and what the possible delays are. I’m not going to sum up all the wait types, but here’s a short list of common wait types you’ll see on your SQL server:

 
SOS_SCHEDULER_YIELD
Yielding processor time

LCK_M_*
Waiting for a lock

OLEDB
Wait on the OLEDB provider (Linked servers, Full-Text Search)

WRITELOG
Writing transaction log to disk

RESOURCE_SEMAPHORE
Waiting for a query memory grant

CXPACKET
Query parallelism

PAGEIOLATCH_*
Latch on a memory address while data is retrieved from disk

LAZYWRITER_SLEEP
System process waiting to start

 
All these different wait types could indicate a problem with your statement or the server. Some are more informative, while others show you a real issue. But what I really would like to show you, is how you can find these wait types.

 
DIY or just ask for help…
One of the ways to find the wait types on your SQL server, is to dive into the seemingly endless list of DMV’s. You could use the “sys.dm_exec_requests” and “sys.dm_os_waiting_tasks” DMV’s to find what you want, or you could take the easy way out: sp_WhoIsActive by Adam Machanic (Blog | @AdamMachanic ).

Adam (also the party starter of T-SQL Tuesday) wrote a no less than brilliant script to find problems on your server. But how does it work?

 
Installing sp_WhoIsActive
The “installation” of sp_WhoIsActive couldn’t be easier. You just need to download the script, and run it. This creates a stored procedure in the database of your choice. Usually I just create it in the master database. But if you have a DBA database with useful scripts, it’s okay to create it there.

 
Running it for the first time
The stored procedure can be executed without any parameters. That way, you use the default options. Just run the statement shown below:

EXEC master.dbo.sp_WhoIsActive

 
If you need it, or just like to see more information, you can also configure the procedure with a lot of parameters. If you want to see all the options you can configure, just set the documentation parameter to 1 (true):

EXEC master.dbo.sp_WhoIsActive
  @help = 1

 
The options
If you start using sp_WhoIsActive more and more, you’ll get your own set of favorite options. It all depends on the purpose you’re using the procedure for. Most of the time, I use it to determine why queries run slow, or why the performance of the SQL server is so low.

The information sp_WhoIsActive retrieves gives you a good indication of what SQL Server is doing, or what queries are bugging each other. I’ll list my favourite options below:

First, I set @show_own_spid on, so I can see my own query in the resultset.

The second option I love is @get_plans. This shows you the execution plans of the running queries:

 
Another great parameter to set is @get_outer_command. That way, you won’t just see the query running at the moment, but also the outer-command of the query (in the example below, the INSERT INTO the temp table is executed from within the stored procedure you see in the right column):

 
To see which transaction logs are used when running your query, set @get_transaction_info to 1:

 
Information regarding locks can be found, by setting @get_locks to 1:

 
If you click the XML, you’ll see which locks are granted, pending or denied:

 
The last option I’d like to set, is @get_additional_info. This will show you more information regarding the connection settings, session variables, etc:

 
Clicking the XML shows you the properties I mentioned above:

 
So this is what the query looks like, the way I personally like to use it:

EXEC master.dbo.sp_WhoIsActive
  @show_own_spid = 1,
  @get_plans = 1,
  @get_outer_command = 1,
  @get_transaction_info = 1,
  @get_locks = 1,
  @get_additional_info = 1

 
Conclusion
Wait types are your keys to open the door of the next level of SQL Server. Not all wait types are that easy to read and understand, but there are plenty of resources to be found online. For example, just take a look at the rest of the posts today. Most of the posts for T-SQL Tuesday can be found on Twitter, when you search for #TSQL2sDay.


I want to say thanks to the employees at Coeo for the easy explanation of some of the wait types!

Incremental updates with Change Data Capture

When designing a database or ETL process, for example loading your production data into a reporting environment, you always start your design with performance in mind. In the beginning of the project, your scripts and ETL run blazing fast. But after a few months in production, the entire project grinds to a halt. But how do you fix that problem, without a complete redesign of your applications and database? One of the many solutions is an easy one: incrementally load your data into the destination tables.

 
Change Data Capture
Incremental data loading could be a hard nut to crack. It’s not always an option, but it might be a good point to start from. One of the ways to start loading your data incrementally, is by using the keys in your database as a reference. If your table has a column called “Modified Date”, and that is updated every time the record is updated, you could use that. Every night, when the process runs, you just add the records that were modified after the last successful process run. But what if you don’t have that possibility? Change Data Capture (CDC) is an easy way out.

 
CDC is a way to record inserts, updates and deletes on a specific table, without the need of writing the triggers yourself. CDC reads the transaction log, and captures all changes made to the specific table. These changes are stored in the associated change table, that is created by CDC.

Below I’m going to show you how to setup your first table with CDC.If you would like to know more about CDC, this TechNet article is a place to start.

 
Create an example
To show you the basics of CDC, let start with creating a table called TestCDC in the database called Sandbox:

USE Sandbox
GO

CREATE TABLE dbo.TestCDC
  (ID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
	 Descr varchar(50) NULL)
GO

 
Once you’ve created the table, turn on CDC at the database level, by execution the system stored procedure created to do that:

EXEC sys.sp_cdc_enable_db

 
There is also a system stored procedure to enable CDC on the table level. You need to enable CDC on tables manually, and separately for every table you need:

EXEC sys.sp_cdc_enable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @role_name = NULL

 
If the SQL Server Agent is running on your machine or server, you’ll see this confirmation (I’ll explain later why SQL Server Agent is needed):

 
If the Agent isn’t running, you’ll see this warning:

 
If you ran the enable table statement, you will see that SQL Server created the system objects needed to track changes in the table:

 
Because CDC uses 2 SQL Server Agent jobs to capture and cleanup the change tables, you need to run the Agent to start the data capture. If the jobs aren’t running, SQL Server won’t capture any changes made:

 
Start data changes
In order to see what happens when you change data, let’s insert some records:

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description')

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description too...')

 
And let’s update one of those 2 inserted records:

UPDATE dbo.TestCDC
SET Descr = 'UPD - ' + Descr
WHERE ID = 2

 
Now, let’s check the content of both the original table, and the change table:

/* Original table */
SELECT * FROM dbo.TestCDC

/* Change table */
SELECT * FROM cdc.dbo_TestCDC_CT

 
If you run both queries, you’ll see the resultset below:

 
The records in the CDC change table allow you to update the data in your reporting environment. You could query them yourself, by retrieving all the changes since your last update. You can also use the procedures that return those changes for you, for example the cdc.fn_cdc_get_net_changes_. You can read more about the system function here.

 
Cleaning up after an update
Now that you’ve updated your reporting environment, it’s a wise thing to cleanup your CDC data. You could also drop the records yourself with a DELETE statement. Another option is using the system procedure for that: “sys.sp_cdc_cleanup_change_table”. You can clean your data using the following SQL statement:

DECLARE @Last_LSN VARBINARY(10) =
  (SELECT MAX(cdc.dbo_TestCDC_CT.[__$start_lsn]) FROM cdc.dbo_TestCDC_CT)

EXEC sys.sp_cdc_cleanup_change_table
  @capture_instance = 'dbo_TestCDC',
  @low_water_mark = @Last_LSN,
  @threshold = 5000

 
The query will retrieve the last LSN (Log Sequence Number), and remove everything that happened before that.

 
Cleanup of CDC
If you want to completely remove CDC (because we’re done testing), you can disable it on the table level by running the query below:

EXEC sys.sp_cdc_disable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @capture_instance = 'dbo_TestCDC'

 
The statement will cleanup all the objects that were created to enable CDC on that specific table. But the statement will only stop the CDC on the specific table. The fastest way to disable CDC on all tables in the database, is disabling CDC on the database level, by running the query below:

EXEC sys.sp_cdc_disable_db

 
Conclusion
Loading data always takes time, and there are many factors that are important: your database size, your frequency of changes, your ETL process, etc. The time it costs you to move data can be changed by rewriting your process to incremental loads. CDC is one of the many ways to achieve this. It works out of the box, and doesn’t require you to build any process yourself. But maybe your environment needs a custom process to operate the way you want it to. Not every feature in SQL Server is a so called silver bullet, but sometimes it comes darn close to one…

T-SQL Tuesday #48 – Cloud Atlas

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Cloud Atlas”. If you want to read the opening post, please click the image below to go to the party-starter: Jorge Segarra (Blog | @SQLChicken).



 
In the last few years, “the cloud” has become more and more important in our lives. Not only in IT, or as a database- or data-professionals, but also in our personal lives. Take a look around you. How many people do you still see, carrying around a USB drive to store their data? Now do the same, and count the people that use a cloud solution for their data, like Dropbox, SkyDrive (if we are still allowed to call it that…), or Copy.com?

So everyone is storing their data in the cloud now. From personal information like a copy of a passport, to text files with peoples password lists. So without jumping to conclusions just yet, I guess we trust the companies that hold our data right…?

 
Trust
But now comes the hard (and controversial) part: we trust them with our personal data, but not our corporate data. It’s okay to store your passwords and private documents in the cloud, but it’s unthinkable that you store business data in the cloud!

So where is that distrust coming from? It probably has something to do with the whole NSA-thing. There, I said it! Without completely going off-topic, I would like to explain something about this statement.

My personal opinion is that people in the Netherlands are different from the rest of the world, when it comes to their privacy. They don’t care if the ISP is monitoring web traffic. They know it’s being monitored, but they accept that as a fact. When it comes to downloading games, music or movies, they think their entitled to that. But when it comes to government agencies monitoring the corporate data they put in the cloud, they draw the line.

 
Are you… the one…?
In the past few years, the discussion about on premise and off premise data intensified. People try to convince each other with arguments, and think the other is completely wrong.

A while ago, I encountered my first “cloud-company”. I’ve done some consulting for them, and they’ve set themselves the goal to move to the cloud within the next few years. The biggest advantages they see are costs, scalability and administration. And I fully agree with them.

 
Why use a cloud solution
Choosing a WASD (Windows Azure SQL Database) solution makes it easier to maintain your environment. You don’t have to monitor the hardware, and move to another server if your hardware fails or dies. This is all being taken care of by Microsoft.

Looking at the cost of a cloud solution is easy: it saves you money. When you run on premise servers, where you need a data center, electricity, maintenance team, etc. When you use a cloud solution, you only pay for the hardware you need. And if you’re done with it, you can just shut down the machine you were working on.

The same goes for scalability. For example, if you need to run a billing process, you could “spawn” twice as many cloud machines. This makes scalability a piece of cake. And again, when your done, just get rid of the machines you don’t use anymore. This makes it easier for companies to run big processes in a smaller amount of time.

 
Trying it out
The only time I’ve used WASD is on the machine that Jamie Thomson (Blog | @jamiet) made available to the SQL Family (read about it here). This was later taken over by Red-Gate, but I’m not sure this is still available.

But if you want to try it out, just create your own AdventureWorks on Azure. You can download the scripts here, and start your Azure trial here.

Follow

Get every new post delivered to your Inbox.

Join 50 other followers