Speeding up your backups

Last week, I watched the SQLRally session of Pieter Vanhove (Blog | @Pieter_Vanhove), where he talked about Advanced Backup and Restore. He mentioned striped backups (taking a backup of a database, divided over a number of files). After watching the video, I started to wonder what made the biggest difference: the number of files, the number of disks, compression.

 
Test setup
In order to test this, I restored a copy of the AdventureWorks2012 database, that you can download from msftdbprodsamples.codeplex.com. But because this database is only 200MB, taking a backup would only take a second. In order to make this a bit more interesting, I used a script Adam Machanic (Blog | @AdamMachanic) wrote. This script creates 2 new objects in the AdventureWorks database (dbo.bigProduct and dbo.bigTransactionHistory), which results in a database of 2.8GB. You can download the script here.

 
What matters most?
In order to test the differences in performance, I’ve tested multiple combinations:

– Multiple files on single disk
– Multiple files on 2 disks
– Both options above, with and without backup compressions

After running all the different tests, I’ve added the results to an Excel sheet:

 
The results you see are in milliseconds. The analysis on these numbers is a bit difficult, so let’s put these numbers in a graph:

 

 
As you can see, the number of files (when looking at a minimum of 2 files) isn’t the biggest difference. The number of disks, and compression vs no compression make the biggest difference. In my case, I tested it on 2 SSD’s, but you may have more disks in your server. Or better yet, you are able to take backups on multiple LUN’s on your storage, and the LUN’s use multiple disks. The more spindles you can use, the faster your backup will become.

But there’s also a downside to this. When you want to restore a backup, you need all the files to do that. So when you restore a normal backup, you only need 1 file. If you’re going to stripe your backup over 16 files for example, you need all 16 files to perform a restore. So basically, the chance of having a corrupt backup file is 16x as high, compared to a normal (single file) backup. For every advantage, there’s always a disadvantage…

If you want to read more about backups, don’t forget to check out these blog posts:

Julie Koesmarno: On sabbatical
Mickey Stuewe: Transaction Log Backups for the Accidental DBA
Chris Yates: Backups – They Are Needed, Who Knew?

Avoid a big SSISDB by logging less…?

Last week I blogged about how logging can grind your SSIS to a halt. After posting the blog, I got a really interesting reaction from Koen Verbeeck (Blog | @Ko_Ver):

I knew there were different logging levels in SSIS, but I couldn’t recall the difference in levels. So I discussed about the difference with Koen. At first glance we thought that the biggest difference is the fact that performance logging only logs warnings and errors. So basically, you decrease the amount of messages logged, which should results in a slightly better performance. But is that really the case?

 
Different logging levels
In SSIS there are 4 different logging levels. So what are the options, and what’s the difference between them? You can read all about it on MSDN, but the short version is:

None: Logging is turned off
Basic (default value): All evens are logged, except custom- and diagnostic events
Performance: Only performance statistics, OnError and OnWarning events are logged
Verbose: All events are logged

 
The next questions is: where to change these options. For example, how do I change my logging level to performance?

 
SSISDB / Catalog
When you want to change the logging level for all your deployed SSIS packages, you could change the setting on your catalog. You can do that by opening the “Integration Services Catalogs” on your instance, right-click on your catalog, and choose an option for “Server-wide Default Logging Level”:

 
If you change this, the logging level for all packages in this catalog will change. But maybe you don’t want that, and you only want it for a specific package.

 
SQL Server Agent Job
Another option is do configure this in your SQL Server Agent job:

 
If you use this option, the logging level will change for the package that is executed in the job step. So no permanent change, but only for the duration of the job.

 
At package execution
Another option is to do it at package runtime:

 
Personally I don’t use that option of executing packages, but you might. The same applies as the change on the SQL Server Agent job, this will only change the option for the duration of the execution, so no permanent change.

 
How to determine your best choice
Making a good decision is difficult in this case. When something goes wrong, you want to log everything. But when things run smoothly, it’s a waste of time to log every event that passes by in SSIS. And how do you determine the level of logging you need, without knowing what data is in your SSISDB? If you want to analyze that, you could use a query like this:

SELECT
  EventCounts.EventName,
  EventCounts.NumberOfEvents,
  CONVERT(FLOAT,(CONVERT(FLOAT,[NumberOfEvents]) / SUM(NumberOfEvents) OVER ()) * 100) AS 'Percentage'
FROM
  (
    SELECT
      EM.event_name AS 'EventName',
      COUNT(*) AS 'NumberOfEvents'
    FROM SSISDB.catalog.event_messages AS EM
    WHERE EM.event_name IS NOT NULL
    GROUP BY EM.event_name
  ) EventCounts
ORDER BY EventCounts.EventName ASC

 
This shows you the type of events stored in your SSISDB, the amount of events, and a percentage over the whole dataset. This can help you determine the logging level you need in your specific case.

 
But here’s the catch…
Performance logging doesn’t actually make your packages run faster… Koen sent me this blog post from Matt Masson (Blog | @mattmasson). In his blog post, he explains what events are logged at the specific levels. And this is what he says about performance level:

The Performance log level should be used when you are doing benchmarking and performance tuning for your packages. While it actually logs less messages to the [catalog].[operation_messages] view than Basic, it captures a lot more events internally to analyze the performance of the data flow components. As a result, there is a bit more overhead during execution – packages run with Basic will actually run a little faster than Performance (in this case Performance means “give me all of the performance details”, not “run my packages as fast as you can”).

 
Conclusion
Even though SSIS is easy to use, there are some pros and cons. The longer I work with SSIS (and that’s not on a daily basis), the more of these pitfalls I discover. And even though they can be fixed pretty fast, it normally takes you time to figure out the problem, because I’m not that familiar with the inner-workings of SSIS. And even when you think you have found a solution for your problem, SSIS just works slightly different than expected. I’m just glad I have friends like Koen to help me out when needed! Thanks again Koen!

SSISDB: Why bigger isn’t always better

Two weeks ago I encountered some strange issues with the SSISDB. Packages and processes started failing, and all we could find is this error message:

 
The image above is an extract of the standard SSMS “All Executions” report. You can find this by right-clicking your SSISDB, click on Reports, then Standard Reports, and choose the “All Executions” report.

While the packages started failing, we couldn’t find a cause. Deploying new packages wasn’t possible either. There were no events logged in the report, in the Windows event viewer, or in the SQL Server error log. So there was no starting point to further investigate this issue.

 
Finding the cause
Checking the size of the SSISDB showed some possible cause of the issue: it was 72GB big! This seemed odd, because we don’t store a lot of packages in the SSISDB. I knew SSISDB contained some logging information, but I didn’t knew how much. Checking the “Disk usage per table” report (another standard report in SSMS), the problem became a bit more obvious:

 

 
So basically there were over 67 million records in the database! This shouldn’t have happened, because the log retention should be set so 14 days. But we quickly found the problem:

 

 
The retention was set to 365 days. This could only mean the the SSISDB was re-created, and this setting wasn’t changed after the deploy. This resulted in a lot of extra logging data in the database. Because of the amount of record, the standard “SSISDB Maintenance job” ran for 40+ minutes, instead of a maximum of 8 minutes (which it normally needs to clean up logging). But setting the retention to 14 days and running the maintenance job would just result in an endless running job. So how could you prevent that?

 
Grabbing the bigger hammer
In most cases, grabbing a bigger hammer to solve the problem isn’t your best option. In this case, I didn’t see another way. After some searching, I found this blog post about truncating the log tables in SSISDB. Below you find the copy of the script. The credits for this script go to Ibrahim Naji (Blog | @thinknook).


CREATE PROCEDURE [internal].[cleanup_server_retention_window_truncateall]
AS  
     
SET NOCOUNT ON
     
DECLARE @enable_clean_operation bit
DECLARE @retention_window_length INT
     
DECLARE @caller_name nvarchar(256)
DECLARE @caller_sid  varbinary(85)
DECLARE @operation_id BIGINT
     
EXECUTE AS CALLER
    SET @caller_name =  SUSER_NAME()
    SET @caller_sid =   SUSER_SID()
REVERT
          
     
BEGIN TRY
    SELECT @enable_clean_operation = CONVERT(bit, property_value) 
        FROM [catalog].[catalog_properties]
        WHERE property_name = 'OPERATION_CLEANUP_ENABLED'
         
    IF @enable_clean_operation = 1
    BEGIN
        SELECT @retention_window_length = CONVERT(INT,property_value)  
            FROM [catalog].[catalog_properties]
            WHERE property_name = 'RETENTION_WINDOW'
                 
        IF @retention_window_length <= 0 
        BEGIN
            RAISERROR(27163    ,16,1,'RETENTION_WINDOW')
        END
             
        INSERT INTO [internal].[operations] (
            [operation_type],  
            [created_time], 
            [object_type],
            [object_id],
            [object_name],
            [STATUS], 
            [start_time],
            [caller_sid], 
            [caller_name]
            )
        VALUES (
            2,
            SYSDATETIMEOFFSET(),
            NULL,                     
            NULL,                     
            NULL,                     
            1,      
            SYSDATETIMEOFFSET(),
            @caller_sid,            
            @caller_name            
            ) 
        SET @operation_id = SCOPE_IDENTITY() 


        -- Remove all [internal].[executions] dependancies
        TRUNCATE TABLE [internal].[executable_statistics]
        TRUNCATE TABLE [internal].[execution_component_phases]
        TRUNCATE TABLE [internal].[execution_data_statistics]
        TRUNCATE TABLE [internal].[execution_data_taps]
        TRUNCATE TABLE [internal].[execution_parameter_values]
        TRUNCATE TABLE [internal].[execution_property_override_values]


        -- Remove all [internal].[event_message_context] dependancies
        TRUNCATE TABLE [internal].[event_message_context]

        -- Remove all non-dependant tables
        TRUNCATE TABLE [internal].[operation_os_sys_info]
        TRUNCATE TABLE [internal].[operation_permissions]
        TRUNCATE TABLE [internal].[validations]
        TRUNCATE TABLE [internal].[extended_operation_info]

        -- Deal with [internal].[event_messages] and [internal].[operation_messages]
        ALTER TABLE [internal].[event_message_context] DROP CONSTRAINT [FK_EventMessageContext_EventMessageId_EventMessages]
         
        TRUNCATE TABLE internal.event_messages
         
        ALTER TABLE [internal].[event_message_context]  WITH CHECK ADD  CONSTRAINT [FK_EventMessageContext_EventMessageId_EventMessages] FOREIGN KEY([event_message_id])
        REFERENCES [internal].[event_messages] ([event_message_id])
        ON DELETE CASCADE

        ALTER TABLE [internal].[event_messages] DROP CONSTRAINT [FK_EventMessages_OperationMessageId_OperationMessage]
         
        TRUNCATE TABLE [internal].[operation_messages]

        ALTER TABLE [internal].[event_messages]  WITH CHECK ADD  CONSTRAINT [FK_EventMessages_OperationMessageId_OperationMessage] FOREIGN KEY([event_message_id])
        REFERENCES [internal].[operation_messages] ([operation_message_id])
        ON DELETE CASCADE

        -- Deal with [internal].[executions]

        ALTER TABLE [internal].[executable_statistics] DROP CONSTRAINT [FK_ExecutableStatistics_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_component_phases] DROP CONSTRAINT [FK_ExecCompPhases_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_data_statistics] DROP CONSTRAINT [FK_ExecDataStat_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_data_taps] DROP CONSTRAINT [FK_ExecDataTaps_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_parameter_values] DROP CONSTRAINT [FK_ExecutionParameterValue_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_property_override_values] DROP CONSTRAINT [FK_ExecutionPropertyOverrideValue_ExecutionId_Executions]

        TRUNCATE TABLE [internal].[executions]

        ALTER TABLE [internal].[execution_property_override_values]  WITH CHECK ADD  CONSTRAINT [FK_ExecutionPropertyOverrideValue_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE

        ALTER TABLE [internal].[execution_parameter_values]  WITH CHECK ADD  CONSTRAINT [FK_ExecutionParameterValue_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE

        ALTER TABLE [internal].[execution_data_taps]  WITH CHECK ADD  CONSTRAINT [FK_ExecDataTaps_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE

        ALTER TABLE [internal].[execution_data_statistics]  WITH CHECK ADD  CONSTRAINT [FK_ExecDataStat_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE
         
        ALTER TABLE [internal].[execution_component_phases]  WITH CHECK ADD  CONSTRAINT [FK_ExecCompPhases_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE
         
        ALTER TABLE [internal].[executable_statistics]  WITH CHECK ADD  CONSTRAINT [FK_ExecutableStatistics_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE
         

        -- Deal with [internal].[operations]
        DECLARE @deleted_ops TABLE(operation_id BIGINT, operation_type SMALLINT)

        DELETE --TOP (@delete_batch_size)
        FROM [internal].[operations] 
        OUTPUT DELETED.operation_id, DELETED.operation_type INTO @deleted_ops
        WHERE operation_id != @operation_id

             
             
        DECLARE @execution_id BIGINT
        DECLARE @sqlString              nvarchar(1024)
        DECLARE @key_name               [internal].[adt_name]
        DECLARE @certificate_name       [internal].[adt_name]
             
             
        DECLARE execution_cursor CURSOR LOCAL FOR 
            SELECT operation_id FROM @deleted_ops 
            WHERE operation_type = 200
             
        OPEN execution_cursor
        FETCH NEXT FROM execution_cursor INTO @execution_id
             
        WHILE @@FETCH_STATUS = 0
        BEGIN
            SET @key_name = 'MS_Enckey_Exec_'+CONVERT(VARCHAR,@execution_id)
            SET @certificate_name = 'MS_Cert_Exec_'+CONVERT(VARCHAR,@execution_id)
            SET @sqlString = 'IF EXISTS (SELECT name FROM sys.symmetric_keys WHERE name = ''' + @key_name +''') '
                +'DROP SYMMETRIC KEY '+ @key_name
                EXECUTE sp_executesql @sqlString
            SET @sqlString = 'IF EXISTS (select name from sys.certificates WHERE name = ''' + @certificate_name +''') '
                +'DROP CERTIFICATE '+ @certificate_name
                EXECUTE sp_executesql @sqlString
            FETCH NEXT FROM execution_cursor INTO @execution_id
        END
        CLOSE execution_cursor
        DEALLOCATE execution_cursor

        END
END TRY
BEGIN CATCH
         
         
    IF (CURSOR_STATUS('local', 'execution_cursor') = 1 
        OR CURSOR_STATUS('local', 'execution_cursor') = 0)
    BEGIN
        CLOSE execution_cursor
        DEALLOCATE execution_cursor            
    END
         
    UPDATE [internal].[operations]
        SET [STATUS] = 4,
        [end_time] = SYSDATETIMEOFFSET()
        WHERE [operation_id] = @operation_id;       
    THROW
END CATCH
     
RETURN 0

 
After running this on the test environment, I found out it worked as I expected. I started the script on production (where the problem was found in the first place), and when it finished I changed the retention from 365 to 14 days:

 

 
After doing that, I ran the “SSISDB Maintenance job” (which ran fine), and an SSIS package as a test (worked fine as well). And looking at the record counts, it worked fine:

 

 
I’m not saying you should run this on your system, but I’m just saying this works fine on my system, and in my specific case. If you find any code on the internet, test it on a non-critical system first, before you run it in production!!!

 
Conclusion
Having error logging is a good thing. It can help you determine problems, and you can see if and how process grow over time. But too much logging for SSIS causes performance issues, prevents you from deploying new packages, and can even cause unexpected termination of SSIS packages. One way to prevent that is by changing the history retention of your SSISDB.

SQL Sentry Plan Explorer: You can’t live without it

Every data professional out there will run into slow running queries, or performance issues you can’t explain at some point. At that moment, it’s difficult to explain the problem without looking at an execution plan. SQL Server Management Studio (SSMS) has build-in functionality to look at these execution plans. But this isn’t always as useful as we would like it to be. But there is a great free tool that’ll help you with query-tuning and pinpointing the issue in bad performing queries.

 
Download
SQL Sentry Plan Explorer is free, and available on the website of SQL Sentry. Even though it says it’s a trial version, it won’t expire after a certain period. The only thing that’s “trial” in this version, is that some functionality is blocked in the free version. But all the good stuff is available in the free version.

 
Integration in SSMS
When you start the install, the install doesn’t ask you to shut down SSMS. But I recommend you do. If you don’t close SSMS, you won’t see the SSMS add-in menu. It will show after the setup is finished, and you start a new instance of SSMS.

 
Creating a query, and opening it in Plan Explorer
As an example, I’ve created a really bad query on the Adventureworks2012 database:

USE AdventureWorks2012
GO


DECLARE @MinPrice INT = -1;


WITH Shipping AS
(
SELECT
  PV.ProductID AS ProductID,
  UM.Name AS ShippingPer,
  CASE
    WHEN UM.Name = 'Each' THEN PV.StandardPrice
    WHEN UM.Name = 'Dozen' THEN PV.StandardPrice / 12
    ELSE @MinPrice
  END AS ShippingCostPerUnit
FROM Purchasing.ProductVendor AS PV
INNER JOIN Production.UnitMeasure AS UM ON UM.UnitMeasureCode = PV.UnitMeasureCode
)


SELECT
  P.ProductID,
  P.ProductNumber,
  P.Name,
  S.ShippingCostPerUnit,
  Quantity.TotalQuantity,
  P.ListPrice,
  dbo.ufnGetProductListPrice(P.ProductID, GETDATE()) AS XYZ,
  Locations.TotalLocations,
  P.ListPrice + S.ShippingCostPerUnit AS TotalCostProduct,
  Quantity.TotalQuantity * P.ListPrice AS TotalValueStock,
  ((Quantity.TotalQuantity * P.ListPrice) / Locations.TotalLocations) AS AverageValuePerLocation
FROM Production.Product AS P
INNER JOIN Shipping AS S ON S.ProductID = P.ProductID
CROSS APPLY
(
  SELECT SUM(Quantity) AS TotalQuantity
  FROM Production.ProductInventory
  WHERE ProductID = P.ProductID
  GROUP BY ProductID
) AS Quantity
CROSS APPLY
(
  SELECT COUNT(LocationID) AS TotalLocations
  FROM Production.ProductInventory --WITH(INDEX(0))
  WHERE ProductID = P.ProductID
) AS Locations
WHERE P.ListPrice <> 0
ORDER BY P.ProductID, P.ProductNumber, P.Name, TotalLocations ASC

 
If you run this query in SSMS, and you include the actual execution plan (Ctrl + M), it will show you the execution plan in a separate result window. In this window, you’ll have the option to right-click, and choose “View with SQL Sentry Plan Explorer”:

 
If you click this, you’ll open Plan Explorer, and it will show you the execution plan:

 
So, is that all?
I can almost hear you think: So what’s the difference between Plan Explorer and the default SSMS windows, besides the fancy colors? Just take a look at all the extra opportunities you get with Plan Explorer. For example, how does your join diagram look? Can you pull that from SSMS? No? Well I can do that with Plan Explorer:

 
Your most expensive operation in the query? Yes, you could do that by looking at the percentages shown in your queryplan. But can you show me why they are that expensive? Again, I can do that with Plan Explorer:

 
Can you do you job without it?
If I ask myself this question, I think I can honestly answer this with: yes. Yes, I can do my job without it. But this makes it SO much easier to pinpoint the problem, and to get a quick overview of the query performance. Normally I look at the queryplan in SSMS first, and then immediately open up a Plan Explorer window, to take a closer look at the problems.

So if you write queries on a daily basis, and you’re responsible for, or interested in, qery performance: download it today, and try it out yourself. I’ll promise you, you won’t regret downloading it!
If you want to read more about SQL Sentry Plan Explorer, don’t forget to check out these blog posts:

Julie Koesmarno: Analysing Execution Plans With SQL Sentry Plan Explorer
Mickey Stuewe: On sabbatical
Chris Yates: SQL Sentry Plan Explorer – Don’t Leave Home Without It

Write readable and high-performance queries with Window Functions

In SQL Server 2008, we gained a new and powerful feature in our bag of T-SQL tricks: Window Functions. The actual Window Function is the OVER clause, that allows you to determine partitions or “groups” in your query, before applying another function. In practice, this means you can create groups in your dataset, that can be molded further by applying functions to that groups.

In the past, I’ve written a number of posts about Window Functions in SQL Server:

Row_Number: Unique ID in select statement
Calculating Running Totals
Removing duplicates in a dataset

But there’s more!

 
Finding missing numbers in sequence
How many times did you look at a table, and you noticed that one of the values in a sequence is missing? Or a date range in a table, where someone deleted a few records. So all of a sudden, your year doesn’t have 365 days (or 366 in a leap year, or 365.242199 days which is the years average), but 315 days. There go your management reports that are calculated on a per day average. So how do you find the missing dates, without having to write a cursor or create a full blown date table like your BI colleagues do? You guessed right: a query with Window Function!

Let’s start with declaring a table, and insert a sequence with missing values:

DECLARE @Sequence TABLE
  (Number INT)


INSERT INTO @Sequence
  (Number)
VALUES
  (1),
  (10),
  (7),
  (4),
  (2),
  (8),
  (5)

 
So how can we get the “nearest numbers” from that table with just a single select statement?

SELECT
  LAG(Number, 1, 0) OVER (ORDER BY Number) AS LAG_Value,
  Number,
  LEAD(Number, 1, 0) OVER (ORDER BY Number) AS LEAD_Value
FROM @Sequence AS S
ORDER BY Number ASC

 
The LAG and LEAD are standard t-sql functions from sql server 2012 on. These functions give you the opportunity to access the previous or next row, without the need for a so called “self-join”. So what you see is the number, the record preceding that value (LAG) and the following value. So in this case, number 2 is preceded by Number 1, and followed by Number 4.

 
The lemonade stand
Now let’s look at another example. How about you? When you grew up, you wanted to save money for a new mobile phone right? In my case it was either a disc man, a Walkman, or a stereo set. But let’s stick with the modern equivalent of the Walkman for now: the MP3 player. So to earn money for the MP3 player, our fictitious friend who is called Joe, decides to start a lemonade stand. He needs to save up at least $150 to buy a new MP3 player. So every glass of lemonade he sells is accounted for, and at the end of the day he sums up all his
earnings, and puts it into a table:

DECLARE @Profit TABLE
  (DayNumber INT,
   Sales DECIMAL(10,2))


INSERT INTO @Profit
  (DayNumber, Sales)
VALUES
  (1,  6.90),
  (2,  4.17),
  (3,  2.69),
  (4,  7.26),
  (5,  2.93),
  (6,  8.98),
  (7,  7.25),
  (8,  5.88),
  (9,  1.51),
  (10, 7.97),
  (11, 3.44),
  (12, 3.76),
  (13, 9.96),
  (14, 0.92),
  (15, 8.28),
  (16, 6.05),
  (17, 9.40),
  (18, 4.03),
  (19, 9.14),
  (20, 7.25),
  (21, 0.06),
  (22, 9.12),
  (23, 7.39),
  (24, 6.57),
  (25, 4.54),
  (26, 0.09),
  (27, 4.42),
  (28, 9.53),
  (29, 5.09),
  (30, 0.89)

 
So as you can see, he earns quite a lot of money this way! But because he’s eager to buy his new MP3 player, he wants to see his day totals, and the amount he needs to buy his new toy. But because Joe is a really smart guy, he doesn’t want to do this with a lot of self-joins, and he wants his results fast. So looking at performance, what is the easiest way to query this data? How about this:

DECLARE @Goal DECIMAL(10,2) = 150.00


SELECT
  DayNumber,
  Sales,
  @Goal - SUM(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
He declares a “goal” variable, that is set to the amount he needs for his new gadget. So for every row, we calculate the sum of that day, plus all the sales he made from the first day (UNBOUNDED PRECEDING) and today (CURRENT ROW). After day 28 he has earned enough to buy his MP3 player. But now he wants to know what his average sales were. So he calculates the average of his sales, based on every sale he’s made so far:

SELECT
  DayNumber,
  Sales,
  AVG(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS MoneyNeeded
FROM @Profit

 
So where does it stop?
So now that we’ve seen the SUM and AVERAGE option, what do we have left? How far can we take this? Thinking about it, how about a daily checkup if we hit a lowest or highest Sales amount? We can do this with the MIN and MAX option on the same query:

SELECT
  DayNumber,
  Sales,
  MIN(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS LowestSale
FROM @Profit


SELECT
  DayNumber,
  Sales,
  MAX(Sales) OVER(ORDER BY DayNumber
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS HighestSale
FROM @Profit

 
Now Joe can save his daily sales in the evening, and check if he had a very good, or a very bad day.

 
Not only for the average Joe…
So how can the business profit from all this? In some cases, Window Functions make it easier to output a certain resultset. In some cases it even gives you a whole new way to output data with a well performing (single) query, that was impossible until now. So if you’re running SQL Server 2008 or higher, start using (or at least start exploring) Window Functions right away!

 

To generate the random floats, I’ve used the generator of FYIcenter.com

If you want to read more about this topic, don’t forget to check out these blog posts:

Julie Koesmarno: ABC Classification With SQL Server Window Function
Mickey Stuewe: A Date At The End of The Month
Chris Yates: Windows functions who knew

T-SQL Tuesday #50 – Automation: yea or nay

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Automation”. If you want to read the opening post, please click the image below to go to the party-starter: Hemanth D. (Blog | @SqlChow).



 
Being a DBA, you want to automate as many processes as you can, in order to save time that you can spend on more important things. But have you ever considered over-automating your processes?

 
We’re safe! What can go wrong here?
At one of the companies I worked for, they thought they had everything sorted out. Indexes were rebuild every day or every week (depended on the database), databases and logfiles were shrinked, databases were checked for corruption, backups were running, etc. They felt safe, knowing that if something happened they could anticipate on any situation SQL Server would throw at them. It would blow up in their faces eventually…

One of the first things I checked were the backups. The backup job was running, but only a few databases were actually selected for backup. And the biggest database (500+ GB), which was pretty important, was skipped because it took too long to backup. And guess what, they didn’t EVER test recovering from a backup, because of a lack of disk space and time. And there you have it: a false sense of safety!

I don’t have to tell you not to shrink your database and logfiles. Everybody knows that every time you shrink your database, a kitten dies… Or an index dies… Or the soul of your database… I’m not sure which one, but take your pick. It causes (and I quote Paul Randal (Blog | @PaulRandal) on this!): “*massive* index fragmentation”. Read more about that over at Paul’s blog. Besides that, if your next query needs more space in a data- or logfile you'll see more wait time because of file growth.

The indexes were rebuild every night on the important databases, and every weekend on less used databases. But they never checked if the problem they had before was fixed when switching to this solution.

Also the corruption check was run only on user databases. They never heard of running a corruption check on system databases. The system database were in the backup process, but they never took the time checked if they could restore them or were running a backup of a corrupted database.

 
Focus on the important stuff
So instead of automating all your processes, maybe you should focus on what’s really important. You just automated your backup process. But does it run every time? Are the backups actually written to disk? Can you restore one of the backups you created?

What I’m trying to say is, you can automate tasks whenever and wherever you like, but don’t forget to test them. Once you’ve automated something, plan regular tests to verify if the automated process runs the way you expect it to. And is the end result really the result you want and expect?

 
Don’t reinvent the wheel
Another tip is: don’t reinvent the wheel. There are more people that encountered the same issue, and wrote about it or logged about a solution. So before you build your own maintenance solution, or automate health status reports, check with your community members. There’s can be found help for every problem, but the checkup on that solution is all on you.

T-SQL Tuesday #49 – Wait for it…

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Wait for it…”. If you want to read the opening post, please click the image below to go to the party-starter: Robert Davis (Blog | @SQLSoldier).



 
Explaining developers how SQL Server works is something we all do. Maybe not on a daily basis, but you’re asked questions like “why is my index not working”, or “what’s the best way to add multiple columns to a table”. And most of the time, these questions lead to whole other bunch of questions you need to answer. And the one question we all are asked more than once: “why is my query running slow?”. So where do you start explaining?

 
Wait Types
There are lots and lots of wait types that can be found in SQL Server. In SQL Server 2005 there are 230 different wait types, 475 in SQL Server 2008 and 491 in SQL Server 2008 R2. In SQL Server 2012 they added another 197 new ones to the list. The wait types can be found by running this query:

SELECT wait_type
FROM sys.dm_os_wait_stats
ORDER BY wait_type ASC

 
These wait types can tell you what SQL Server is doing to execute your statement, and what the possible delays are. I’m not going to sum up all the wait types, but here’s a short list of common wait types you’ll see on your SQL server:

 
SOS_SCHEDULER_YIELD
Yielding processor time

LCK_M_*
Waiting for a lock

OLEDB
Wait on the OLEDB provider (Linked servers, Full-Text Search)

WRITELOG
Writing transaction log to disk

RESOURCE_SEMAPHORE
Waiting for a query memory grant

CXPACKET
Query parallelism

PAGEIOLATCH_*
Latch on a memory address while data is retrieved from disk

LAZYWRITER_SLEEP
System process waiting to start

 
All these different wait types could indicate a problem with your statement or the server. Some are more informative, while others show you a real issue. But what I really would like to show you, is how you can find these wait types.

 
DIY or just ask for help…
One of the ways to find the wait types on your SQL server, is to dive into the seemingly endless list of DMV’s. You could use the “sys.dm_exec_requests” and “sys.dm_os_waiting_tasks” DMV’s to find what you want, or you could take the easy way out: sp_WhoIsActive by Adam Machanic (Blog | @AdamMachanic ).

Adam (also the party starter of T-SQL Tuesday) wrote a no less than brilliant script to find problems on your server. But how does it work?

 
Installing sp_WhoIsActive
The “installation” of sp_WhoIsActive couldn’t be easier. You just need to download the script, and run it. This creates a stored procedure in the database of your choice. Usually I just create it in the master database. But if you have a DBA database with useful scripts, it’s okay to create it there.

 
Running it for the first time
The stored procedure can be executed without any parameters. That way, you use the default options. Just run the statement shown below:

EXEC master.dbo.sp_WhoIsActive

 
If you need it, or just like to see more information, you can also configure the procedure with a lot of parameters. If you want to see all the options you can configure, just set the documentation parameter to 1 (true):

EXEC master.dbo.sp_WhoIsActive
  @help = 1

 
The options
If you start using sp_WhoIsActive more and more, you’ll get your own set of favorite options. It all depends on the purpose you’re using the procedure for. Most of the time, I use it to determine why queries run slow, or why the performance of the SQL server is so low.

The information sp_WhoIsActive retrieves gives you a good indication of what SQL Server is doing, or what queries are bugging each other. I’ll list my favourite options below:

First, I set @show_own_spid on, so I can see my own query in the resultset.

The second option I love is @get_plans. This shows you the execution plans of the running queries:

 
Another great parameter to set is @get_outer_command. That way, you won’t just see the query running at the moment, but also the outer-command of the query (in the example below, the INSERT INTO the temp table is executed from within the stored procedure you see in the right column):

 
To see which transaction logs are used when running your query, set @get_transaction_info to 1:

 
Information regarding locks can be found, by setting @get_locks to 1:

 
If you click the XML, you’ll see which locks are granted, pending or denied:

 
The last option I’d like to set, is @get_additional_info. This will show you more information regarding the connection settings, session variables, etc:

 
Clicking the XML shows you the properties I mentioned above:

 
So this is what the query looks like, the way I personally like to use it:

EXEC master.dbo.sp_WhoIsActive
  @show_own_spid = 1,
  @get_plans = 1,
  @get_outer_command = 1,
  @get_transaction_info = 1,
  @get_locks = 1,
  @get_additional_info = 1

 
Conclusion
Wait types are your keys to open the door of the next level of SQL Server. Not all wait types are that easy to read and understand, but there are plenty of resources to be found online. For example, just take a look at the rest of the posts today. Most of the posts for T-SQL Tuesday can be found on Twitter, when you search for #TSQL2sDay.


I want to say thanks to the employees at Coeo for the easy explanation of some of the wait types!

Follow

Get every new post delivered to your Inbox.

Join 65 other followers