T-SQL Tuesday #31 – Logging
June 12, 2012 Leave a comment
A few weeks ago I heard about T-SQL Tuesday. This is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.
When I read the subject of this month is logging, the first thing that came up in my mind was: what do you mean with logging? Logging application performance? Logging system health? Logging the outcome of a process? Some of these you use every day, and some of these you abuse every day.
Instead of writing about specific SQL server logging, or technical solutions or scripts, I decided to write about the function of logging. When to use logging, and how to use it is (in my opinion) just as important as the technical implementation of it.
Monitoring vs Logging
One of the projects I saw in the last couple of years shows an example of abuse. The company wanted to know if the products they sold where actually processed throughout the back-end systems. So they bought some big TV’s to show dashboard software they bought. Quickly, the dashboard were filled with numbers regarding product sales, process time needed from order to shipment, etc.
As soon as these dashboards were up, they started to alter them. More and more information was added to the dashboard, and soon after that the information-overkill was immense. Then the inevitable happened… Management started to use these numbers to monitor the health of the back-end systems! If the process timings were climbing, they ordered the IT department to check out the production environment for disturbances or bottlenecks.
This is a typical example of abusing logging, to monitor the complete production environment. Instead of creating “checkpoints” for their data-movements, they created a top level view that showed something went wrong, but only at the very last moment. So instead of concentrating on the smallest issue or bottleneck at first, they needed to drill through tons of information and errors to get to the actual issue.
What do you want to achieve?
As you can see in the example above, you need to ask yourself a question before building logging: what do you want to achieve with it? Does the business want to see the money flow from front-end application all the way to your database? Do you want to adjust business processes via the information you extract from logs (maybe in a sort of Data Warehouse)? Do you want to monitor the health of your systems (being a developer or a guy working in the IT department)?
Once you’ve established that, the next question is: what is the scope of the logging? Do you want to monitor up-time? Do you want to see the outcome of a process, or do you just want to log errors? It might even be both, telling you that the process finished with a certain status, so that you can store timings of certain processes.
The last question you want to answer is where to store your logging information. It’s easy to store your logging information into a SQL server database, so you can easily query the information you want. On the other hand, the information you store might slow down the rest of the SQL server because you share I/O cycles on the same physical disks. Then a file (for example CSV) on a different server might be interesting.
Another solution is to e-mail the error message to a certain mailbox or user, if the process finished unexpected. But if you want to log or store all messages, this might not be the best approach for your users and your Exchange server.
Production vs Logging
When logging SQL server messages, I’m always very careful what I log. A past experience learned me to ask myself one question in particular: Do you want to run production without logging, or do you want to log you’re not running production? What I mean by that is whenever you need to log timings or other process messages, you slow down the production environment. Whenever you see someone running SQL profiler on a production machine, you know he’s doing a bad job. In my opinion, this is only a valid option when you’ve used and tried all of your other options to find the bottleneck.
A good example of this I’ve experienced before. The company had some strange transactions on their database. So naturally, they wanted to know who constantly added SQL server users (with sysadmin permissions) to the SQL instance on the production machine.
One of the system engineers thought it was a good idea to run SQL profiler on the instance. The adding of the user occurred at random moments, so they decided to let profiler run for a while. After a few weeks (!!!) of constantly running profiler, another system administrator (that didn’t know that profiler ran from another machine) was wondering why the production machine was so slow. Eventually he asked me to help him, we looked at the machine, and turned off the profiler.
Curious about what the profiler logged, we checked out the database that was used as log destination. It was a table with millions and millions of records, and almost none of them was useful. So they had to make up a lot of lost time, while the person responsible for adding the users was still missing.
Eventually we solved the mystery; it was a senior developer who added the user. After he added the user for his application (it was used for billing once a month), he was pissed of that someone deleted his user again. So this vicious circle cost them a lot of time and frustration. Not only for them, but also time they could have spend making the company money, but instead used it for a ghost hunt.
Giving one answer to all questions is impossible. For every specific challenge you’ll encounter, you need a new solution (or at least partially). So whenever you’re thinking of creating a solution for all of your applications: forget about it! This is impossible! Tons of developers have tried this, and none of them found a perfect solution. Every case needs to be handled with care, and needs a specific solution or workaround.
When logging, try to log the least possible, with the biggest impact possible. This might sound like kicking down an open door, but it isn’t. It’s as basic as that. Remember the impact of your logging, and try to tune it down where possible. Keep it basic!