Source control: Using Visual Studio Online in SSMS

As database professionals, we’re all aware of the importance of backups. We make sure the backup process runs as expected, and (hopefully) we restore a backup every now and then to test if the restore process works. But what about source control for your scripts? Most of us don’t use source control in our daily job. But source control is a much a part of the backup process as the actual database backups.

So looking for an easy way out, I focused on Visual Studio Online (VS Online). This is a free online source control system, that you can use once you created an account. You can login on the website with your Microsoft Live account, enter some information (like a username, etc), and you’re ready to go!

But how do you go from writing a query in SQL Server Management Studio (SSMS), to checking in your .sql code files into VS Online? Let’s take a look.

 
Installing TFS tools
In order to use VS Online, you need to install some extra things on your machine. The first thing you need to download is the “Team Explorer for Microsoft Visual Studio 2013″. This installs the team explorer, that you need to get things from and check thing into VS online. This install requires a reboot, so please take that into account!

The second and last installation is the “Microsoft Visual Studio Team Foundation Server 2013 MSSCCI Provider”. This installs the provider, that takes care of the communication with VS Online.

For this installation, it’s important that you download the right version. If you’re running the 32-bit version of SSMS, you need to download the 32-bit version of the provider. If you install the 64-bit version, it won’t work (and trust me on this, I’ve made that mistake before!).

 
VS Online
At this point, I’m assuming you have an account for VS Online. If you log in to the website, and you go to your account url ([AccountName].visualstudio.com), you see the “Create your first team project” page. On this page, you need to create a project before you can check in any files:

 
Configure source control in SSMS
The next step is to configure SSMS, to use the source control provider you just installed. After the installation, you get a new menu in SSMS for source control:

In this menu, click on “Open from Source Control”. In the window that opens, click “Servers…”:

Click “Add…” to add a TFS server or URL:

Now you need to enter your account URL in the textbox:

Once you’ve done that a login screen pops up. Log in with your Microsoft Live account (the same you used to create your VS Online profile), and you’re authenticated:

At this point, your source control is added, and you’ll see it in the overview screen:

If you close the windows that are shown, you end up in your source control project overview:

 
Creating a SQL Server Scripts project/solution
Now that we’ve installed the source control providers, it’s time to create our SQL Scripts project. In SSMS, click on File => New => Project (or Ctrl+Shift+N), and choose “SQL Server Scripts”:

Now that you have a project, it might be handy to show what’s in the solution (if you don’t have that on your screen already). You can open the “Solution Explorer” by clicking on View => Solution Explorer (or Ctrl+Alt+L).

 
Checking in your solution
If you want to check-in your solution or project, you can right-click it in the Solution Explorer, and choose “Add Solution to Source Control”:

 
Another way of doing it, is by clicking on File => Source Control => Add Solution to Source Control. In the next window you are asked for the source control server (click okay if your VS Online URL is selected), and you see the project explorer:

Click on “MySQLProject”, and click OK. Now the check-in screen pops up, where you can “tag” your check-in:

Congratulation! You just checked in your first code in VS Online!

 
Pitfalls and difficulties
In all honesty, there are some downsides in using this, but those are mainly issues in SSMS. For example, you can’t create folders in your SQL Script project. You have to work with the default “Queries” folder you get by default. So there’s no chance you can add a “Finance”, “Maintenance” or “DBA” folder to your project for example.

A work-around for this is creating multiple projects in the solution. So you’ll end up with 3 projects in 1 solution. The nice thing is that you can check-in or check-out 1 single project in your solution, or you can just work on the entire solution at once. This gives you the possibility to work on the solution on your own, or with a group of people.

Another pitfall is the process. If you’re not used to working with queries in source control, it’s easy to forget to check-in your changes at the end of the day. But that’s just something you need to get used to, and need to deal with in your own way. If you’d like to check-in multiple times a day, go ahead and do that. Do you want to check-in your changes at the end of the day only, it’s fine as well.

 
Conclusion
Using source control is great, especially when you have a lot of scripts you need to maintain. I’ve tried to create zip-archives, version numbering my .sql files, different directories, etc. But there’s always a moment that you forget to save your script in an archive, or your computer crashes, and all your work is gone. Using source control prevents these issues (if you use it as intended of course!). VS Online is a really good source control platform (it’s basically TFS online), and it’s free for use. One of the mayor advantages is that VS Online / TFS is fully compatible with SSMS and Visual Studio (which you both use as database professional).

Are you still skeptical? Maybe you should just try it out for a week, to see if it helps you in your daily job. And you know what, maybe you’ll like it. And even better: maybe it’ll someday save your life!

On which port is SQL Server running?

Earlier this week, a good friend asked me the simple question: “How can I check on which port my SQL Server is currently running?” There are multiple ways, so let’s take a look at the ways you can find out the answer!

 
SQL Server Error Log
It’s a misconception that only error messages are logged in the SQL Server error log. There are also informational messages logged, which you can use to find the port on which SQL Server is currently running:

xp_readerrorlog 0, 1, N'Server is listening on'

 
The result:

 
SQL Server Configuration Manager
The SQL Server configuation manager is a tool which is installed alongside SQL Server. In this tool, you can change for example the TCP/IP settings of your SQL Server. It also shows you the current port on which SQL Server is running:

 
DMV
You can also query the system DMV’s:

SELECT DISTINCT 
    local_tcp_port
FROM sys.dm_exec_connections
WHERE local_tcp_port IS NOT NULL

or

SELECT local_tcp_port
FROM   sys.dm_exec_connections
WHERE  session_id = @@SPID

 
Command prompt
Finding the used port via a command prompt requires some more information. First, you need to find the process ID (PID) that SQL Server is using. The quickest way is to use the Task Manager in Windows. Once you have the PID, you can run the commmand below:

“netstat -ano | findstr [PID]”

The result:

 
Registry
The port number can also be found in the registry. Just remember, that the folder you see in the screenshot below contains my instance name (SQL2014DEV). Change this to your instance name!

 
You can also use the “master.dbo.xp_regread” stored procedures to read this registry key from T-SQL:

DECLARE @TCPPort NVARCHAR(5),
        @RegKeyName VARCHAR(8000);

SET @RegKeyName = CONCAT('Software\Microsoft\Microsoft SQL Server\',
                         @@SERVICENAME,
                         '\MSSQLServer\SuperSocketNetLib\TCP')

EXEC xp_regread
  @rootkey = 'HKEY_LOCAL_MACHINE',
  @key = @RegKeyName,
  @value_name = 'TcpPort',
  @value = @TCPPort OUTPUT;

SELECT @TCPPort;

 
Event Viewer
SQL Server also logs the port in the Windows Event Viewer. If you open the application log, and you filter on eventid 26022, you’ll see the port that is in use:

 
PowerShell
After I posted this blog, Johan Bijnens (@alzdba) sent me a message: I forgot the PowerShell option!

#tcpport.ps1
#Evaluates the SQL Server instances on a Windows server and returns the TCP port number used by each instance
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO")  | Out-Null
$m = New-Object ('Microsoft.SqlServer.Management.Smo.WMI.ManagedComputer') 'HOME'
$m.ServerInstances | ForEach-Object { $m.Name + '\' + $_.Name + ', ' +
       $m.ServerInstances[$_.Name].ServerProtocols['Tcp'].IPAddresses['IP1'].IPAddress.IPAddressToString + ':' +
       $m.ServerInstances[$_.Name].ServerProtocols['Tcp'].IPAddresses['IPAll'].IPAddressProperties['TcpDynamicPorts'].Value
       }

 
Thanks Johan for reminding me on this! The full code can be found on MSDN.

SQL Sentry Plan Explorer: You can’t live without it

Every data professional out there will run into slow running queries, or performance issues you can’t explain at some point. At that moment, it’s difficult to explain the problem without looking at an execution plan. SQL Server Management Studio (SSMS) has build-in functionality to look at these execution plans. But this isn’t always as useful as we would like it to be. But there is a great free tool that’ll help you with query-tuning and pinpointing the issue in bad performing queries.

 
Download
SQL Sentry Plan Explorer is free, and available on the website of SQL Sentry. Even though it says it’s a trial version, it won’t expire after a certain period. The only thing that’s “trial” in this version, is that some functionality is blocked in the free version. But all the good stuff is available in the free version.

 
Integration in SSMS
When you start the install, the install doesn’t ask you to shut down SSMS. But I recommend you do. If you don’t close SSMS, you won’t see the SSMS add-in menu. It will show after the setup is finished, and you start a new instance of SSMS.

 
Creating a query, and opening it in Plan Explorer
As an example, I’ve created a really bad query on the Adventureworks2012 database:

USE AdventureWorks2012
GO


DECLARE @MinPrice INT = -1;


WITH Shipping AS
(
SELECT
  PV.ProductID AS ProductID,
  UM.Name AS ShippingPer,
  CASE
    WHEN UM.Name = 'Each' THEN PV.StandardPrice
    WHEN UM.Name = 'Dozen' THEN PV.StandardPrice / 12
    ELSE @MinPrice
  END AS ShippingCostPerUnit
FROM Purchasing.ProductVendor AS PV
INNER JOIN Production.UnitMeasure AS UM ON UM.UnitMeasureCode = PV.UnitMeasureCode
)


SELECT
  P.ProductID,
  P.ProductNumber,
  P.Name,
  S.ShippingCostPerUnit,
  Quantity.TotalQuantity,
  P.ListPrice,
  dbo.ufnGetProductListPrice(P.ProductID, GETDATE()) AS XYZ,
  Locations.TotalLocations,
  P.ListPrice + S.ShippingCostPerUnit AS TotalCostProduct,
  Quantity.TotalQuantity * P.ListPrice AS TotalValueStock,
  ((Quantity.TotalQuantity * P.ListPrice) / Locations.TotalLocations) AS AverageValuePerLocation
FROM Production.Product AS P
INNER JOIN Shipping AS S ON S.ProductID = P.ProductID
CROSS APPLY
(
  SELECT SUM(Quantity) AS TotalQuantity
  FROM Production.ProductInventory
  WHERE ProductID = P.ProductID
  GROUP BY ProductID
) AS Quantity
CROSS APPLY
(
  SELECT COUNT(LocationID) AS TotalLocations
  FROM Production.ProductInventory --WITH(INDEX(0))
  WHERE ProductID = P.ProductID
) AS Locations
WHERE P.ListPrice <> 0
ORDER BY P.ProductID, P.ProductNumber, P.Name, TotalLocations ASC

 
If you run this query in SSMS, and you include the actual execution plan (Ctrl + M), it will show you the execution plan in a separate result window. In this window, you’ll have the option to right-click, and choose “View with SQL Sentry Plan Explorer”:

 
If you click this, you’ll open Plan Explorer, and it will show you the execution plan:

 
So, is that all?
I can almost hear you think: So what’s the difference between Plan Explorer and the default SSMS windows, besides the fancy colors? Just take a look at all the extra opportunities you get with Plan Explorer. For example, how does your join diagram look? Can you pull that from SSMS? No? Well I can do that with Plan Explorer:

 
Your most expensive operation in the query? Yes, you could do that by looking at the percentages shown in your queryplan. But can you show me why they are that expensive? Again, I can do that with Plan Explorer:

 
Can you do you job without it?
If I ask myself this question, I think I can honestly answer this with: yes. Yes, I can do my job without it. But this makes it SO much easier to pinpoint the problem, and to get a quick overview of the query performance. Normally I look at the queryplan in SSMS first, and then immediately open up a Plan Explorer window, to take a closer look at the problems.

So if you write queries on a daily basis, and you’re responsible for, or interested in, qery performance: download it today, and try it out yourself. I’ll promise you, you won’t regret downloading it!
If you want to read more about SQL Sentry Plan Explorer, don’t forget to check out these blog posts:

Julie Koesmarno: Analysing Execution Plans With SQL Sentry Plan Explorer
Mickey Stuewe: On sabbatical
Chris Yates: SQL Sentry Plan Explorer – Don’t Leave Home Without It

Using FOR XML in a CROSS APPLY

A few blog posts ago, I showed how easy it is to generate JSON, JavaScript or arrays with the FOR XML clause. But in the examples, the output is transformed to text. Selecting this output is the same as a normal select. But what if you really want to output XML? Does this work the same? Let’s create the same example table again, and rebuild it to output XML.

 
Creating resources
Just as in my last blog post, let’s use the airline example:

CREATE TABLE #TMP_AirlineTickets
  (ID INT IDENTITY(1,1),
   Airline VARCHAR(10),
   Departure_Airport_Code VARCHAR(5),
   Price DECIMAL(10,4))


INSERT INTO #TMP_AirlineTickets
  (Airline, Departure_Airport_Code, Price)
VALUES
  ('BA', 'RTM', 10.00),
  ('KLM', 'AMS', 125.00),
  ('BA', 'LHR', 15.00),
  ('KLM', 'BCN', 50.00),
  ('KLM', 'BHX', 75.00)

 
When you write a query with a CROSS APPLY on this table, it works like it’s supposed to:

SELECT AT.Airline, CA.Departure_Airport_Code
FROM #TMP_AirlineTickets AT
CROSS APPLY
	(
		SELECT Departure_Airport_Code
		FROM #TMP_AirlineTickets A
		AND A.Airline = AT.Airline
	) CA
GROUP BY AT.Airline, CA.Departure_Airport_Code

 
There is a DISTINCT or GROUP BY needed in this query, to return only the unique records. As we’ll see later on, DISTINCT doesn’t work for this query, so a GROUP BY is used. This results in a dataset where every Airline code is joined to every airport they fly to:

 
Generating XML in CROSS APPLY
But in the previous post, we got a string as output. But now, we want to return XML instead of a normal resultset. But what happens if we put the FOR XML clause in this query? If we do so, it throws an exception:

SELECT AT.Airline, CA.Departure_Airport_Code
FROM #TMP_AirlineTickets AT
CROSS APPLY
	(
		SELECT Departure_Airport_Code
		FROM #TMP_AirlineTickets A
		AND A.Airline = AT.Airline
		FOR XML PATH('')
	) CA
GROUP BY AT.Airline, CA.Departure_Airport_Code

 

 
Apparently column 1 of the CROSS APPLY result has no column name. Looking at the result that comes from the CROSS APPLY, the column name is generated automatically:

 
Giving a column alias to the column in the select doesn’t work. But you can also provide a column list to a cross apply:

SELECT AT.Airline, CONVERT(XML, CA.Airport_XML) AS Airport_XML
FROM #TMP_AirlineTickets AT
CROSS APPLY
    (
        SELECT Departure_Airport_Code AS DAC
        FROM #TMP_AirlineTickets A
        AND A.Airline = AT.Airline
        FOR XML PATH('')
    ) CA (Airport_XML)
GROUP BY AT.Airline, CA.Airport_XML

 
By adding a column list to the CROSS APPLY, we can now use the column name Airport_XML in the outer select. By doing that, we get the airline name, and the departure airport code list:

 
Returning XML
So when you want to generate XML in a query, it’s possible to do that with a CROSS APPLY. It’s just a easy as writing a normal query, but it makes you a lot more flexible when you need to output XML from SQL Server. With this, there is no need for CLR’s or external applications to generate the XML you need, but you can just do it with T-SQL.

Using OUTPUT clause to track changes

Whenever something changes in a database, you would like to know what has changed. This change might brake your database or application later, so all schema or server changes are monitored. But how about data changes? Do you monitor those? You don’t have to feel bad if you don’t, because most people don’t track those changes. But you can track these changes in a very easy way, by using the output clause in your query. Let’s take a look at how that works.

 
Create resources
As example, let’s take a look at a bank account. If you have a bank account, there’s some money in it, and you spend money which will be subtracted from your account balance. First of all, let’s make two tables: one for the account balance, and one for logging the changes:

CREATE TABLE #Account
  (ID INT IDENTITY(1,1),
   Balance FLOAT)


CREATE TABLE #Logging
  (DEL_ID INT,
   DEL_Balance FLOAT,
   UPD_ID INT,
   UPD_Balance FLOAT,
   Balance_Difference FLOAT)

 
To keep it simple, the account table has just one column, and that column contains the current balance of the account. Let’s insert a balance to start with:

INSERT INTO #Account
  (Balance)
VALUES
  (250.25)

 
If we run an update query now, the account balance will change. But do we still know what the original balance was? Not without logging. So let’s see how the output clause can change this:

UPDATE #Account
SET Balance = 125.00
OUTPUT Deleted.ID,
       Deleted.Balance,
       Inserted.ID,
       Inserted.Balance,
	   Inserted.Balance - Deleted.Balance
INTO #Logging

 
The balance table now only contains the new value. But by using the output clause, that logged the changes we made, we can check the logging table, and find the old value. When looking at the output clause, you can use the temporary objects called Deleted and Inserted. The Deleted table contains the old values (before the change), and the Inserted contains the new values (the values after the change). When checking the logging table, you’ll see both the old and the new values:

 
So as you can see, the logging table contains both the old and the new version of the record. And this not only works for update queries, but also for Inserts, Deletes and Merge-statements. This gives you the opportunity to log changes in the queries you think need it. Just think about what you want to log, create a log table, and add an output clause to your query. It’s not always the case, but with this, it’s as easy as that.

 
If you want to read more about CRUD, don’t forget to check out these blog posts:

Julie Koesmarno: On sabbatical
Mickey Stuewe: Going Beyond The INSERT Statement
Chris Yates: C-R-U-D The Basics

Using FOR XML to create arrays, JSON and JavaScript

Every new project brings its own challenges and pitfalls. New projects also give you a chance to be creative in finding solutions, and try new things. This keeps it interesting, and allows you to learn new things. And sometimes, the things you avoided for all these years, turn out to be your best option.

 
A missing feature in SQL Server
For a project, we needed to generate JSON or JavaScript from T-SQL. Even though the whole world seems to use JSON at this point, there is no feature to generate JSON from T-SQL or SQL Server. There are more than enough people who would like such a feature (just look at the Connect item made for this). Maybe they’ll put it in the vNext of SQL Server, who knows. But at this moment, we need to generate the JSON ourselves.

 
Generating the data array
First of all, we need to generate an array or list. The problem with this, is the number of rows. For example, if you only have 2 rows per product, you can join the table onto itself, and concatenate the columns like that. But what if you have more rows per product, or you don’t know how many rows per product are returned?

One of the ways to resolve this, is by using the FOR XML clause. You can use this in a CROSS APPLY, to create an array/list of data, grouped by another column.

So let’s create an example, by using airlines. When you fly to another state, country or continent, you’ll book a flight with an airline. Those airlines have a name, a departure airport code, and of course a ticket price. Let’s put this data in a table:

CREATE TABLE #TMP_AirlineTickets
  (ID INT IDENTITY(1,1),
   Airline VARCHAR(10),
   Departure_Airport_Code VARCHAR(5),
   Price DECIMAL(10,4))


INSERT INTO #TMP_AirlineTickets
  (Airline, Departure_Airport_Code, Price)
VALUES
  ('BA', 'RTM', 10.00),
  ('KLM', 'AMS', 125.00),
  ('BA', 'LHR', 15.00),
  ('KLM', 'BCN', 50.00),
  ('KLM', 'BHX', 75.00)

 
When we query this data, we could easily use a sub-select to query. But because I want my query to be readability, and good performing, I used the FOR XML clause:

SELECT
  AT.Airline,
  STUFF((SELECT
            CONCAT(',', SUB.Departure_Airport_Code)
         FROM #TMP_AirlineTickets SUB
         WHERE SUB.Airline = AT.Airline
         ORDER BY SUB.Departure_Airport_Code ASC
         FOR XML PATH('')
        ), 1, 1, '') AS Array_Airport_Codes
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
What happens in this query? The FOR XML clause creates an XML for the airline that is selected in the outer query. So a query like this is executed:

SELECT
  SUB.Departure_Airport_Code
FROM #TMP_AirlineTickets SUB
WHERE SUB.Airline = 'KLM'
ORDER BY SUB.Departure_Airport_Code ASC
FOR XML PATH('')

 
The result from that query is an XML column:

 
By concatenating a character to that XML column, the XML nodes are removed. This gives us the possibility to use this as an array:

SELECT
  CONCAT(',', SUB.Departure_Airport_Code)
FROM #TMP_AirlineTickets SUB
WHERE SUB.Airline = 'KLM'
ORDER BY SUB.Departure_Airport_Code ASC
FOR XML PATH('')

 
The result of this statement looks like this:

 
The next step is to use this in a query that creates an XML column for all airlines in the table:

SELECT
  AT.Airline,
  (SELECT
            CONCAT(',', SUB.Departure_Airport_Code)
         FROM #TMP_AirlineTickets SUB
         WHERE SUB.Airline = AT.Airline
         ORDER BY SUB.Departure_Airport_Code ASC
         FOR XML PATH('')
        ) AS Array_Airport_Codes
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
The result is an array of airport departure codes:

 
As you can see, this creates the array/list for all airlines in the table. With STUFF I replace the first character, because I don’t want my result to start with a comma. This can be done with a RIGHT clause, that USES LEN – 1 to strip of the first character on the left, but this looks better in my opinion:

SELECT
  AT.Airline,
  STUFF((SELECT
            CONCAT(',', SUB.Departure_Airport_Code)
         FROM #TMP_AirlineTickets SUB
         WHERE SUB.Airline = AT.Airline
         ORDER BY SUB.Departure_Airport_Code ASC
         FOR XML PATH('')
        ), 1, 1, '') AS Array_Airport_Codes
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
The result looks the same, except the first comma is gone:

 
One of the best things when creating demos like this for a blog post, is that you can make it as unrealistic as you want. What I mean by that, is I’m making it easy to generate XML, in order to keep the examples clear and understandable. But what if you want to use this in the real world? You’re probably seeing datasets that aren’t as easy to use as the demo above. The strings you want to concatenate are filled with special characters, etc. Luckily there’s a solution for that. Rob Farley (Blog | @rob_farley) wrote a really nice blog post about that.

 
So now we have our array/list ready, but how do we convert this to JSON or JavaScript? The main difference (sorry if I insult my developer friends with this statement), is that in JavaScript everything needs to be enclosed in double quotes (“). This isn’t needed if you create JSON. In the following example, I’ll create a JSON object. But because just airport codes isn’t what we need, I’ll adjust the query so it’ll add the prices to the object as well. This is done by an additional CONCAT in the XML select, which concatenates the airport code and the price into 1 string:

SELECT
  AT.Airline,
  CONCAT('{"ap":[',
         STUFF(
               (SELECT
                    CONCAT(',{"', SUB.Departure_Airport_Code, '":', SUB.Price, '}')
                FROM #TMP_AirlineTickets SUB
                WHERE SUB.Airline = AT.Airline
                ORDER BY SUB.Departure_Airport_Code ASC
                FOR XML PATH('')
                ), 1, 1, '')
         , ']}') AS JSON
FROM #TMP_AirlineTickets AT
WHERE 1 = 1
GROUP BY AT.Airline

 
The GROUP BY can be replaced by a DISTINCT. But either one of those is needed to make sure you’re only returning unique records:

 

 
By parsing the second result in the query in an online tool, you can see that the result is valid JSON:

 

 
By adding some double quotes in the script above, you can easily generate JavaScript from T-SQL. But if you just want to use the FOR XML to generate arrays, this will also work perfectly.

 
Why would you want to use this?
Generating JSON or JavaScript might come in handy for webdevelopers, that can get data from a database, and use that directly on their website. By using the FOR XML clause to generate JSON, you make sure your queries are flexible enough to generate valid JSON, even if your dataset grows. Generating arrays like this can also come in handy for developers that need data from their databases. Instead of creating an array in code, they can now use the array that is returned from the database.

So even though I tried to work my way around using FOR XML, it turned out to be the best option thinkable!

Write readable and high-performance queries with Window Functions

In SQL Server 2008, we gained a new and powerful feature in our bag of T-SQL tricks: Window Functions. The actual Window Function is the OVER clause, that allows you to determine partitions or “groups” in your query, before applying another function. In practice, this means you can create groups in your dataset, that can be molded further by applying functions to that groups.

In the past, I’ve written a number of posts about Window Functions in SQL Server:

Row_Number: Unique ID in select statement
Calculating Running Totals
Removing duplicates in a dataset

But there’s more!

 
Finding missing numbers in sequence
How many times did you look at a table, and you noticed that one of the values in a sequence is missing? Or a date range in a table, where someone deleted a few records. So all of a sudden, your year doesn’t have 365 days (or 366 in a leap year, or 365.242199 days which is the years average), but 315 days. There go your management reports that are calculated on a per day average. So how do you find the missing dates, without having to write a cursor or create a full blown date table like your BI colleagues do? You guessed right: a query with Window Function!

Let’s start with declaring a table, and insert a sequence with missing values:

DECLARE @Sequence TABLE
  (Number INT)


INSERT INTO @Sequence
  (Number)
VALUES
  (1),
  (10),
  (7),
  (4),
  (2),
  (8),
  (5)

 
So how can we get the “nearest numbers” from that table with just a single select statement?

SELECT
  LAG(Number, 1, 0) OVER (ORDER BY Number) AS LAG_Value,
  Number,
  LEAD(Number, 1, 0) OVER (ORDER BY Number) AS LEAD_Value
FROM @Sequence AS S
ORDER BY Number ASC

 
The LAG and LEAD are standard t-sql functions from sql server 2012 on. These functions give you the opportunity to access the previous or next row, without the need for a so called “self-join”. So what you see is the number, the record preceding that value (LAG) and the following value. So in this case, number 2 is preceded by Number 1, and followed by Number 4.

 
The lemonade stand
Now let’s look at another example. How about you? When you grew up, you wanted to save money for a new mobile phone right? In my case it was either a disc man, a Walkman, or a stereo set. But let’s stick with the modern equivalent of the Walkman for now: the MP3 player. So to earn money for the MP3 player, our fictitious friend who is called Joe, decides to start a lemonade stand. He needs to save up at least $150 to buy a new MP3 player. So every glass of lemonade he sells is accounted for, and at the end of the day he sums up all his
earnings, and puts it into a table:

DECLARE @Profit TABLE
  (DayNumber INT,
   Sales DECIMAL(10,2))


INSERT INTO @Profit
  (DayNumber, Sales)
VALUES
  (1,  6.90),
  (2,  4.17),
  (3,  2.69),
  (4,  7.26),
  (5,  2.93),
  (6,  8.98),
  (7,  7.25),
  (8,  5.88),
  (9,  1.51),
  (10, 7.97),
  (11, 3.44),
  (12, 3.76),
  (13, 9.96),
  (14, 0.92),
  (15, 8.28),
  (16, 6.05),
  (17, 9.40),
  (18, 4.03),
  (19, 9.14),
  (20, 7.25),
  (21, 0.06),
  (22, 9.12),
  (23, 7.39),
  (24, 6.57),
  (25, 4.54),
  (26, 0.09),
  (27, 4.42),
  (28, 9.53),
  (29, 5.09),
  (30, 0.89)

 
So as you can see, he earns quite a lot of money this way! But because he’s eager to buy his new MP3 player, he wants to see his day totals, and the amount he needs to buy his new toy. But because Joe is a really smart guy, he doesn’t want to do this with a lot of self-joins, and he wants his results fast. So looking at performance, what is the easiest way to query this data? How about this:

DECLARE @Goal DECIMAL(10,2) = 150.00


SELECT
  DayNumber,
  Sales,
  @Goal - SUM(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
He declares a “goal” variable, that is set to the amount he needs for his new gadget. So for every row, we calculate the sum of that day, plus all the sales he made from the first day (UNBOUNDED PRECEDING) and today (CURRENT ROW). After day 28 he has earned enough to buy his MP3 player. But now he wants to know what his average sales were. So he calculates the average of his sales, based on every sale he’s made so far:

SELECT
  DayNumber,
  Sales,
  AVG(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
So where does it stop?
So now that we’ve seen the SUM and AVERAGE option, what do we have left? How far can we take this? Thinking about it, how about a daily checkup if we hit a lowest or highest Sales amount? We can do this with the MIN and MAX option on the same query:

SELECT
  DayNumber,
  Sales,
  MIN(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS LowestSale
FROM @Profit


SELECT
  DayNumber,
  Sales,
  MAX(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS HighestSale
FROM @Profit

 
Now Joe can save his daily sales in the evening, and check if he had a very good, or a very bad day.

 
Not only for the average Joe…
So how can the business profit from all this? In some cases, Window Functions make it easier to output a certain resultset. In some cases it even gives you a whole new way to output data with a well performing (single) query, that was impossible until now. So if you’re running SQL Server 2008 or higher, start using (or at least start exploring) Window Functions right away!

 

To generate the random floats, I’ve used the generator of FYIcenter.com

If you want to read more about this topic, don’t forget to check out these blog posts:

Julie Koesmarno: ABC Classification With SQL Server Window Function
Mickey Stuewe: A Date At The End of The Month
Chris Yates: Windows functions who knew

Follow

Get every new post delivered to your Inbox.

Join 67 other followers