Write readable and high-performance queries with Window Functions
April 14, 2014 2 Comments
In SQL Server 2008, we gained a new and powerful feature in our bag of T-SQL tricks: Window Functions. The actual Window Function is the OVER clause, that allows you to determine partitions or “groups” in your query, before applying another function. In practice, this means you can create groups in your dataset, that can be molded further by applying functions to that groups.
In the past, I’ve written a number of posts about Window Functions in SQL Server:
But there’s more!
Finding missing numbers in sequence
How many times did you look at a table, and you noticed that one of the values in a sequence is missing? Or a date range in a table, where someone deleted a few records. So all of a sudden, your year doesn’t have 365 days (or 366 in a leap year, or 365.242199 days which is the years average), but 315 days. There go your management reports that are calculated on a per day average. So how do you find the missing dates, without having to write a cursor or create a full blown date table like your BI colleagues do? You guessed right: a query with Window Function!
Let’s start with declaring a table, and insert a sequence with missing values:
DECLARE @Sequence TABLE (Number INT) INSERT INTO @Sequence (Number) VALUES (1), (10), (7), (4), (2), (8), (5)
So how can we get the “nearest numbers” from that table with just a single select statement?
SELECT LAG(Number, 1, 0) OVER (ORDER BY Number) AS LAG_Value, Number, LEAD(Number, 1, 0) OVER (ORDER BY Number) AS LEAD_Value FROM @Sequence AS S ORDER BY Number ASC
The LAG and LEAD are standard t-sql functions from sql server 2012 on. These functions give you the opportunity to access the previous or next row, without the need for a so called “self-join”. So what you see is the number, the record preceding that value (LAG) and the following value. So in this case, number 2 is preceded by Number 1, and followed by Number 4.
The lemonade stand
Now let’s look at another example. How about you? When you grew up, you wanted to save money for a new mobile phone right? In my case it was either a disc man, a Walkman, or a stereo set. But let’s stick with the modern equivalent of the Walkman for now: the MP3 player. So to earn money for the MP3 player, our fictitious friend who is called Joe, decides to start a lemonade stand. He needs to save up at least $150 to buy a new MP3 player. So every glass of lemonade he sells is accounted for, and at the end of the day he sums up all his
earnings, and puts it into a table:
DECLARE @Profit TABLE (DayNumber INT, Sales DECIMAL(10,2)) INSERT INTO @Profit (DayNumber, Sales) VALUES (1, 6.90), (2, 4.17), (3, 2.69), (4, 7.26), (5, 2.93), (6, 8.98), (7, 7.25), (8, 5.88), (9, 1.51), (10, 7.97), (11, 3.44), (12, 3.76), (13, 9.96), (14, 0.92), (15, 8.28), (16, 6.05), (17, 9.40), (18, 4.03), (19, 9.14), (20, 7.25), (21, 0.06), (22, 9.12), (23, 7.39), (24, 6.57), (25, 4.54), (26, 0.09), (27, 4.42), (28, 9.53), (29, 5.09), (30, 0.89)
So as you can see, he earns quite a lot of money this way! But because he’s eager to buy his new MP3 player, he wants to see his day totals, and the amount he needs to buy his new toy. But because Joe is a really smart guy, he doesn’t want to do this with a lot of self-joins, and he wants his results fast. So looking at performance, what is the easiest way to query this data? How about this:
DECLARE @Goal DECIMAL(10,2) = 150.00 SELECT DayNumber, Sales, @Goal - SUM(Sales) OVER(ORDER BY DayNumber ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded FROM @Profit
He declares a “goal” variable, that is set to the amount he needs for his new gadget. So for every row, we calculate the sum of that day, plus all the sales he made from the first day (UNBOUNDED PRECEDING) and today (CURRENT ROW). After day 28 he has earned enough to buy his MP3 player. But now he wants to know what his average sales were. So he calculates the average of his sales, based on every sale he’s made so far:
SELECT DayNumber, Sales, AVG(Sales) OVER(ORDER BY DayNumber ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded FROM @Profit
So where does it stop?
So now that we’ve seen the SUM and AVERAGE option, what do we have left? How far can we take this? Thinking about it, how about a daily checkup if we hit a lowest or highest Sales amount? We can do this with the MIN and MAX option on the same query:
SELECT DayNumber, Sales, MIN(Sales) OVER(ORDER BY DayNumber ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS LowestSale FROM @Profit SELECT DayNumber, Sales, MAX(Sales) OVER(ORDER BY DayNumber ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS HighestSale FROM @Profit
Now Joe can save his daily sales in the evening, and check if he had a very good, or a very bad day.
Not only for the average Joe…
So how can the business profit from all this? In some cases, Window Functions make it easier to output a certain resultset. In some cases it even gives you a whole new way to output data with a well performing (single) query, that was impossible until now. So if you’re running SQL Server 2008 or higher, start using (or at least start exploring) Window Functions right away!
To generate the random floats, I’ve used the generator of FYIcenter.com
If you want to read more about this topic, don’t forget to check out these blog posts: