T-SQL Tuesday #50 – Automation: yea or nay

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Automation”. If you want to read the opening post, please click the image below to go to the party-starter: Hemanth D. (Blog | @SqlChow).



 
Being a DBA, you want to automate as many processes as you can, in order to save time that you can spend on more important things. But have you ever considered over-automating your processes?

 
We’re safe! What can go wrong here?
At one of the companies I worked for, they thought they had everything sorted out. Indexes were rebuild every day or every week (depended on the database), databases and logfiles were shrinked, databases were checked for corruption, backups were running, etc. They felt safe, knowing that if something happened they could anticipate on any situation SQL Server would throw at them. It would blow up in their faces eventually…

One of the first things I checked were the backups. The backup job was running, but only a few databases were actually selected for backup. And the biggest database (500+ GB), which was pretty important, was skipped because it took too long to backup. And guess what, they didn’t EVER test recovering from a backup, because of a lack of disk space and time. And there you have it: a false sense of safety!

I don’t have to tell you not to shrink your database and logfiles. Everybody knows that every time you shrink your database, a kitten dies… Or an index dies… Or the soul of your database… I’m not sure which one, but take your pick. It causes (and I quote Paul Randal (Blog | @PaulRandal) on this!): “*massive* index fragmentation”. Read more about that over at Paul’s blog. Besides that, if your next query needs more space in a data- or logfile you'll see more wait time because of file growth.

The indexes were rebuild every night on the important databases, and every weekend on less used databases. But they never checked if the problem they had before was fixed when switching to this solution.

Also the corruption check was run only on user databases. They never heard of running a corruption check on system databases. The system database were in the backup process, but they never took the time checked if they could restore them or were running a backup of a corrupted database.

 
Focus on the important stuff
So instead of automating all your processes, maybe you should focus on what’s really important. You just automated your backup process. But does it run every time? Are the backups actually written to disk? Can you restore one of the backups you created?

What I’m trying to say is, you can automate tasks whenever and wherever you like, but don’t forget to test them. Once you’ve automated something, plan regular tests to verify if the automated process runs the way you expect it to. And is the end result really the result you want and expect?

 
Don’t reinvent the wheel
Another tip is: don’t reinvent the wheel. There are more people that encountered the same issue, and wrote about it or logged about a solution. So before you build your own maintenance solution, or automate health status reports, check with your community members. There’s can be found help for every problem, but the checkup on that solution is all on you.

2013 – a summary

Now that 2013 is almost over, everybody is starting to create lists. I’m not going to bother you with endless lists to sum up this year. You can find the most important things for this year below.

 
Blog posts
Last year I set myself a goal: write more blog posts in 2013 then in 2012. In 2012 I wrote 29 posts, which is about 2.5 per month. In 2013 I managed to write 33 blog posts (about 2.75 per month). It’s still not the amount I hoped, but at least it’s better. So that makes a 25% increase compared to last year.

 
Traffic
One of the advantages of writing more, is the increase in website traffic. Compared to last year, I’m at about 4x more traffic. This is something I’m particularly proud of, because that’s a way to give something back to the community. It’s awesome to help people solve a problem, and I owe it to all of you that helped me in the past and now.

 
SQL Saturday
The first SQL Saturday I attended: SQLSaturday #221 – Holland 2013. It was a wonderful event, where I’ve seen a lot of good sessions by some great speakers, and met a lot of people in person. Hopefully the next SQL Saturday will be as great as this one!

 
Consulting
One of the things that was on the list was working on a job as a consultant. A couple of friends gave me the opportunity to try that, and I liked it very much! Hopefully 2014 will be the year that I can expand this, and work on more projects as a consultant.

 
Great people
But the greatest thing this year was meeting a lot of wonderful people from the SQL community. I met some of them online, and some of them in person. Thank you, especially the ones I now consider friends. You know I’m talking about you when you read this.

 
2014
Looking at 2014 there are a lot of cool things coming. Hopefully this is going to be a busy year, and year that I can give back more to the community as a speaker and author. At this moment I can’t talk about these big plans, but when I can I’ll post it here.

I wish you all a healthy, happy and productive 2014!

T-SQL Tuesday #49 – Wait for it…

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Wait for it…”. If you want to read the opening post, please click the image below to go to the party-starter: Robert Davis (Blog | @SQLSoldier).



 
Explaining developers how SQL Server works is something we all do. Maybe not on a daily basis, but you’re asked questions like “why is my index not working”, or “what’s the best way to add multiple columns to a table”. And most of the time, these questions lead to whole other bunch of questions you need to answer. And the one question we all are asked more than once: “why is my query running slow?”. So where do you start explaining?

 
Wait Types
There are lots and lots of wait types that can be found in SQL Server. In SQL Server 2005 there are 230 different wait types, 475 in SQL Server 2008 and 491 in SQL Server 2008 R2. In SQL Server 2012 they added another 197 new ones to the list. The wait types can be found by running this query:

SELECT wait_type
FROM sys.dm_os_wait_stats
ORDER BY wait_type ASC

 
These wait types can tell you what SQL Server is doing to execute your statement, and what the possible delays are. I’m not going to sum up all the wait types, but here’s a short list of common wait types you’ll see on your SQL server:

 
SOS_SCHEDULER_YIELD
Yielding processor time

LCK_M_*
Waiting for a lock

OLEDB
Wait on the OLEDB provider (Linked servers, Full-Text Search)

WRITELOG
Writing transaction log to disk

RESOURCE_SEMAPHORE
Waiting for a query memory grant

CXPACKET
Query parallelism

PAGEIOLATCH_*
Latch on a memory address while data is retrieved from disk

LAZYWRITER_SLEEP
System process waiting to start

 
All these different wait types could indicate a problem with your statement or the server. Some are more informative, while others show you a real issue. But what I really would like to show you, is how you can find these wait types.

 
DIY or just ask for help…
One of the ways to find the wait types on your SQL server, is to dive into the seemingly endless list of DMV’s. You could use the “sys.dm_exec_requests” and “sys.dm_os_waiting_tasks” DMV’s to find what you want, or you could take the easy way out: sp_WhoIsActive by Adam Machanic (Blog | @AdamMachanic ).

Adam (also the party starter of T-SQL Tuesday) wrote a no less than brilliant script to find problems on your server. But how does it work?

 
Installing sp_WhoIsActive
The “installation” of sp_WhoIsActive couldn’t be easier. You just need to download the script, and run it. This creates a stored procedure in the database of your choice. Usually I just create it in the master database. But if you have a DBA database with useful scripts, it’s okay to create it there.

 
Running it for the first time
The stored procedure can be executed without any parameters. That way, you use the default options. Just run the statement shown below:

EXEC master.dbo.sp_WhoIsActive

 
If you need it, or just like to see more information, you can also configure the procedure with a lot of parameters. If you want to see all the options you can configure, just set the documentation parameter to 1 (true):

EXEC master.dbo.sp_WhoIsActive
  @help = 1

 
The options
If you start using sp_WhoIsActive more and more, you’ll get your own set of favorite options. It all depends on the purpose you’re using the procedure for. Most of the time, I use it to determine why queries run slow, or why the performance of the SQL server is so low.

The information sp_WhoIsActive retrieves gives you a good indication of what SQL Server is doing, or what queries are bugging each other. I’ll list my favourite options below:

First, I set @show_own_spid on, so I can see my own query in the resultset.

The second option I love is @get_plans. This shows you the execution plans of the running queries:

 
Another great parameter to set is @get_outer_command. That way, you won’t just see the query running at the moment, but also the outer-command of the query (in the example below, the INSERT INTO the temp table is executed from within the stored procedure you see in the right column):

 
To see which transaction logs are used when running your query, set @get_transaction_info to 1:

 
Information regarding locks can be found, by setting @get_locks to 1:

 
If you click the XML, you’ll see which locks are granted, pending or denied:

 
The last option I’d like to set, is @get_additional_info. This will show you more information regarding the connection settings, session variables, etc:

 
Clicking the XML shows you the properties I mentioned above:

 
So this is what the query looks like, the way I personally like to use it:

EXEC master.dbo.sp_WhoIsActive
  @show_own_spid = 1,
  @get_plans = 1,
  @get_outer_command = 1,
  @get_transaction_info = 1,
  @get_locks = 1,
  @get_additional_info = 1

 
Conclusion
Wait types are your keys to open the door of the next level of SQL Server. Not all wait types are that easy to read and understand, but there are plenty of resources to be found online. For example, just take a look at the rest of the posts today. Most of the posts for T-SQL Tuesday can be found on Twitter, when you search for #TSQL2sDay.


I want to say thanks to the employees at Coeo for the easy explanation of some of the wait types!

Incremental updates with Change Data Capture

When designing a database or ETL process, for example loading your production data into a reporting environment, you always start your design with performance in mind. In the beginning of the project, your scripts and ETL run blazing fast. But after a few months in production, the entire project grinds to a halt. But how do you fix that problem, without a complete redesign of your applications and database? One of the many solutions is an easy one: incrementally load your data into the destination tables.

 
Change Data Capture
Incremental data loading could be a hard nut to crack. It’s not always an option, but it might be a good point to start from. One of the ways to start loading your data incrementally, is by using the keys in your database as a reference. If your table has a column called “Modified Date”, and that is updated every time the record is updated, you could use that. Every night, when the process runs, you just add the records that were modified after the last successful process run. But what if you don’t have that possibility? Change Data Capture (CDC) is an easy way out.

 
CDC is a way to record inserts, updates and deletes on a specific table, without the need of writing the triggers yourself. CDC reads the transaction log, and captures all changes made to the specific table. These changes are stored in the associated change table, that is created by CDC.

Below I’m going to show you how to setup your first table with CDC.If you would like to know more about CDC, this TechNet article is a place to start.

 
Create an example
To show you the basics of CDC, let start with creating a table called TestCDC in the database called Sandbox:

USE Sandbox
GO

CREATE TABLE dbo.TestCDC
  (ID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
	 Descr varchar(50) NULL)
GO

 
Once you’ve created the table, turn on CDC at the database level, by execution the system stored procedure created to do that:

EXEC sys.sp_cdc_enable_db

 
There is also a system stored procedure to enable CDC on the table level. You need to enable CDC on tables manually, and separately for every table you need:

EXEC sys.sp_cdc_enable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @role_name = NULL

 
If the SQL Server Agent is running on your machine or server, you’ll see this confirmation (I’ll explain later why SQL Server Agent is needed):

 
If the Agent isn’t running, you’ll see this warning:

 
If you ran the enable table statement, you will see that SQL Server created the system objects needed to track changes in the table:

 
Because CDC uses 2 SQL Server Agent jobs to capture and cleanup the change tables, you need to run the Agent to start the data capture. If the jobs aren’t running, SQL Server won’t capture any changes made:

 
Start data changes
In order to see what happens when you change data, let’s insert some records:

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description')

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description too...')

 
And let’s update one of those 2 inserted records:

UPDATE dbo.TestCDC
SET Descr = 'UPD - ' + Descr
WHERE ID = 2

 
Now, let’s check the content of both the original table, and the change table:

/* Original table */
SELECT * FROM dbo.TestCDC

/* Change table */
SELECT * FROM cdc.dbo_TestCDC_CT

 
If you run both queries, you’ll see the resultset below:

 
The records in the CDC change table allow you to update the data in your reporting environment. You could query them yourself, by retrieving all the changes since your last update. You can also use the procedures that return those changes for you, for example the cdc.fn_cdc_get_net_changes_. You can read more about the system function here.

 
Cleaning up after an update
Now that you’ve updated your reporting environment, it’s a wise thing to cleanup your CDC data. You could also drop the records yourself with a DELETE statement. Another option is using the system procedure for that: “sys.sp_cdc_cleanup_change_table”. You can clean your data using the following SQL statement:

DECLARE @Last_LSN VARBINARY(10) =
  (SELECT MAX(cdc.dbo_TestCDC_CT.[__$start_lsn]) FROM cdc.dbo_TestCDC_CT)

EXEC sys.sp_cdc_cleanup_change_table
  @capture_instance = 'dbo_TestCDC',
  @low_water_mark = @Last_LSN,
  @threshold = 5000

 
The query will retrieve the last LSN (Log Sequence Number), and remove everything that happened before that.

 
Cleanup of CDC
If you want to completely remove CDC (because we’re done testing), you can disable it on the table level by running the query below:

EXEC sys.sp_cdc_disable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @capture_instance = 'dbo_TestCDC'

 
The statement will cleanup all the objects that were created to enable CDC on that specific table. But the statement will only stop the CDC on the specific table. The fastest way to disable CDC on all tables in the database, is disabling CDC on the database level, by running the query below:

EXEC sys.sp_cdc_disable_db

 
Conclusion
Loading data always takes time, and there are many factors that are important: your database size, your frequency of changes, your ETL process, etc. The time it costs you to move data can be changed by rewriting your process to incremental loads. CDC is one of the many ways to achieve this. It works out of the box, and doesn’t require you to build any process yourself. But maybe your environment needs a custom process to operate the way you want it to. Not every feature in SQL Server is a so called silver bullet, but sometimes it comes darn close to one…

T-SQL Tuesday #48 – Cloud Atlas

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Cloud Atlas”. If you want to read the opening post, please click the image below to go to the party-starter: Jorge Segarra (Blog | @SQLChicken).



 
In the last few years, “the cloud” has become more and more important in our lives. Not only in IT, or as a database- or data-professionals, but also in our personal lives. Take a look around you. How many people do you still see, carrying around a USB drive to store their data? Now do the same, and count the people that use a cloud solution for their data, like Dropbox, SkyDrive (if we are still allowed to call it that…), or Copy.com?

So everyone is storing their data in the cloud now. From personal information like a copy of a passport, to text files with peoples password lists. So without jumping to conclusions just yet, I guess we trust the companies that hold our data right…?

 
Trust
But now comes the hard (and controversial) part: we trust them with our personal data, but not our corporate data. It’s okay to store your passwords and private documents in the cloud, but it’s unthinkable that you store business data in the cloud!

So where is that distrust coming from? It probably has something to do with the whole NSA-thing. There, I said it! Without completely going off-topic, I would like to explain something about this statement.

My personal opinion is that people in the Netherlands are different from the rest of the world, when it comes to their privacy. They don’t care if the ISP is monitoring web traffic. They know it’s being monitored, but they accept that as a fact. When it comes to downloading games, music or movies, they think their entitled to that. But when it comes to government agencies monitoring the corporate data they put in the cloud, they draw the line.

 
Are you… the one…?
In the past few years, the discussion about on premise and off premise data intensified. People try to convince each other with arguments, and think the other is completely wrong.

A while ago, I encountered my first “cloud-company”. I’ve done some consulting for them, and they’ve set themselves the goal to move to the cloud within the next few years. The biggest advantages they see are costs, scalability and administration. And I fully agree with them.

 
Why use a cloud solution
Choosing a WASD (Windows Azure SQL Database) solution makes it easier to maintain your environment. You don’t have to monitor the hardware, and move to another server if your hardware fails or dies. This is all being taken care of by Microsoft.

Looking at the cost of a cloud solution is easy: it saves you money. When you run on premise servers, where you need a data center, electricity, maintenance team, etc. When you use a cloud solution, you only pay for the hardware you need. And if you’re done with it, you can just shut down the machine you were working on.

The same goes for scalability. For example, if you need to run a billing process, you could “spawn” twice as many cloud machines. This makes scalability a piece of cake. And again, when your done, just get rid of the machines you don’t use anymore. This makes it easier for companies to run big processes in a smaller amount of time.

 
Trying it out
The only time I’ve used WASD is on the machine that Jamie Thomson (Blog | @jamiet) made available to the SQL Family (read about it here). This was later taken over by Red-Gate, but I’m not sure this is still available.

But if you want to try it out, just create your own AdventureWorks on Azure. You can download the scripts here, and start your Azure trial here.

Removing duplicates in a dataset

As most of you know, “data is only as good as its weakest record”. Or was it “The possession of facts is knowledge, the use of data is wisdom”? Anyway, data quality is one of the most important aspects of a dataset. But in practice, this is one of the first things that fails.

Data quality isn’t something you can create overnight, but it’s something that is achieved over a longer period of time. Because data is volatile, the quality of that data is becoming volatile itself. Who in your development- or administration-team is responsible for data quality? If you can’t think of anyone, you don’t have to be ashamed. In many companies data quality is underestimated.

 
Duplicates
One of the biggest mistakes that I saw at most companies I worked for, was data duplication. Developers created the database, forgot to put primary- and foreign keys on the tables, and they didn’t think of unique constraints. What do you get? Well, one of the least problems you’ll see is data duplication. Similar records being inserted twice or even more, without any constraint to prevent this.

I’m not going to tell you how to prevent duplicates, because that’s a whole other story. Start looking at primary keys, unique constraints, and (in most cases) the overall design of your database.

 
Removing duplicates
The main questions is: how do I get rid of duplicates, without trashing too much data. First of all, let’s create a small table:

CREATE TABLE #T1
  (Descr VARCHAR(10))

 
Now let’s insert some data, including some duplicates:

INSERT INTO #T1
  (Descr)
VALUES
  ('Value 1'),
  ('Value 2'),
  ('Value 2'),
  ('Value 3'),
  ('Value 4'),
  ('Value 5'),
  ('Value 6'),
  ('Value 6'),
  ('Value 6'),
  ('Value 7'),
  ('Value 8')

 
If you’d try to remove the duplicates with a normal DELETE statement, you can only delete both versions of the duplicate record. So how can we remove only one of the duplicates?

 
ROW_NUMBER to the rescue!
In order to delete one of the duplicates, we need to mark the “extra version” of the record. But how can we do this? The records are identical, right? In this case, we can use the function ROW_NUMBER to add a unique ID to the records.

ROW_NUMBER returns a sequential number for every record in the database, based on the grouping (Partition) and order you choose.

In the table we just created, you see that there are two duplicates for “Value 2″, and that the “Value 6″ record is even inserted three times. We want to completely remove these duplicates, and want to save only 1 record for each value.

Now we just add the ROW_NUMBER to our query:

SELECT
  Descr,
  ROW_NUMBER() OVER(PARTITION BY Descr
                    ORDER BY Descr)       AS RowNumber
FROM #T1

 
The result looks like this:

 
As you can see, every record has a unique number, based on the partition (which is basically a group by clause) of the “Descr” column. The order by is mandatory, but doesn’t affect the result in this example.

 
Removing the duplicates
Because ROW_NUMBER can only be used in the SELECT or ORDER BY statement, we can’t directly use it in a DELETE statement. But we can use a CTE to delete the duplicates:

WITH DeleteCTE AS
  (SELECT
      Descr,
      ROW_NUMBER() OVER(PARTITION BY Descr ORDER BY Descr) AS RowNumber
   FROM #T1)


DELETE D
FROM #T1 T
INNER JOIN DeleteCTE D
  ON D.Descr = T.Descr
WHERE D.RowNumber <> 1

 
ROW_NUMBER creates unique records for us, which we are able to delete in the following DELETE statement. We only want to keep one version, so we delete everything that has a “RowNumber” that isn’t 1.

The result of this looks like this:

 
Conclusion
Removing duplicates is something we can do very easily, and is only one single step in achieving data quality. There are many more steps to follow, before you have a trustworthy dataset, that is reliable any time of the day.

After removing the duplicates from your dataset, don’t forget to add primary- and foreign keys and unique constraint to prevent this situation in the future. Because running a script like this every week, isn’t going to solve your problems…

Data paging using offset

With every new release of SQL Server, we get to use new features that make our lives as developers and administrators so much easier. A few days ago, I came across an old piece of code (SQL 2005 if I remember correctly), that I used to page data for a CMS I build when I was still a web developer.

The company I worked for needed a new website, and wanted an HTML-editor to edit content on the website. This content was stored in SQL Server, and was retrieved by the website. With a few business rules we decided which content was visible, and which content was hidden from the website.

One of the features of the website was a news feed. But because there were so much news articles, we needed to show the top 10 articles on the first page and let the user click through to the next page of articles. But because we had so much news in the database, we needed to page this data. Every page should show 10 new articles, and we needed to switch pages for the rest of the news articles.

 
Creating the sample data
In order to show you the problem and solution, we need to create a sample table:

CREATE TABLE dbo.Paging
  (ID INT IDENTITY(1,1),
   Title VARCHAR(50),
   Content VARCHAR(50))

 
The test data we need to insert looks the same for every record:

INSERT INTO dbo.Paging
  (Title, Content)
VALUES
  ('This is an article', 'This is the content')
GO 50

 
This script will insert 50 record in the Paging table.

 
The old way
In older versions of SQL Server you needed to build your own solution to solve this problem. Let’s assume you clicked the news feed button on the website, and we want to switch to page 2 of the results. The solution I build back then looked something like this:

DECLARE @RowsToShow INT = 10,
        @RowsToSkip INT = 10


SELECT TOP(@RowsToShow)
  ID ,
  Title ,
  Content
FROM dbo.Paging
WHERE ID NOT IN
  (SELECT TOP(@RowsToSkip) ID FROM dbo.Paging)
ORDER BY ID ASC

 
In the query above, you can see we skip the first 10 rows, and retrieve the next 10 rows after that. That means that you need to remember on the website which records were retrieved already, and which records you want to see. The easiest way to do this, is by selecting the ID’s you’ve already shown, and retrieve the next set of rows.

This means you get execution plans like this:

 
The new way
From SQL Server 2012 onwards, we can use a new feature that is called OFFSET. That feature allows us to “window” our dataset, and retrieve a small subset of data without using a TOP and subquery, like in the example above. The new query would look like this:

SELECT
  ID ,
  Title ,
  Content
FROM dbo.Paging
ORDER BY ID ASC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY

 
In the query above, you can see an offset of 10 rows, and a fetch of 10 rows. This means that it skips the first 10 records, and retrieves the next 10 records after that. But how can you get this to work with dynamic resultsets and pages? This is one way to do it:

DECLARE @RowsPerPage INT = 10,
        @PageNumber INT = 2

SELECT
  ID ,
  Title ,
  Content
FROM dbo.Paging
ORDER BY ID ASC
OFFSET ((@PageNumber - 1) * @RowsPerPage) ROWS
FETCH NEXT @RowsPerPage ROWS ONLY

 
The offset is calculated by taking the @PageNumber parameter minus one to retrieve the page we want to see. If we wouldn’t do that, the offset would always skip the first 10 records. Then we multiply that number by the @RowsPerPage parameter, to calculate how many results we want to retrieve.

After that, we use the @RowsPerPage in the FETCH NEXT clause to retrieve the number of rows we want (in this case the next 10). This results in a completely different execution plan:

 
As you can see, this has a lot less impact on SQL Server. This becomes really visible if you compare both in SSMS:

 
I/O Costs
Comparing query costs is nice when you quickly compare 2 different approaches for the same solution, but in the end it all comes down to I/O costs. So which query is the fastest solution, and how are we going to test that?

First of all, we need to measure the I/O calls. We do that by using a DBCC command called DROPCLEANBUFFERS. This allows us “to test queries with a cold buffer cache without shutting down and restarting the server”. SO DON’T USE THIS IN PRODUCTION!!!

So the complete testscript looks like this:

SET NOCOUNT ON
SET STATISTICS IO ON


DBCC DROPCLEANBUFFERS

--==================================================
DECLARE @RowsToShow INT = 10,
        @RowsToSkip INT = 10


SELECT TOP(@RowsToShow)
  ID ,
  Title ,
  Content
FROM dbo.Paging
WHERE ID NOT IN
  (SELECT TOP(@RowsToSkip) ID FROM dbo.Paging)
ORDER BY ID ASC
--==================================================

DBCC DROPCLEANBUFFERS

--==================================================
DECLARE @RowsPerPage INT = 10,
        @PageNumber INT = 2

SELECT
  ID ,
  Title ,
  Content
FROM dbo.Paging
ORDER BY ID ASC
OFFSET ((@PageNumber - 1) * @RowsPerPage) ROWS
FETCH NEXT @RowsPerPage ROWS ONLY
--==================================================

So we clean the SQL Server buffers, run the first query, clean the buffers again, and run the second query. Now the effect of the new statement is really obvious, if you look at I/O costs:

 
So the old version of the query (with the sub-select) scans the table twice, and reads 51 pages from the cache. The new approach (with the OFFSET) scans the table only once, and reads only 1 page from the cache.

 
Conclusion
The less I/O calls SQL Server needs to retrieve the result from disk or cache, the faster your query will run. In this case, we’ve tuned the query from 51 pages read to 1 page read. And we’ve only tested this on a table with 50 records, but the network traffic is decreased significantly. So here’s a lot of performance improvement from only one new piece of functionality. And there is a lot more out there.

Follow

Get every new post delivered to your Inbox.

Join 34 other followers