September 24, 2013 2 Comments
With every new release of SQL Server, we get to use new features that make our lives as developers and administrators so much easier. A few days ago, I came across an old piece of code (SQL 2005 if I remember correctly), that I used to page data for a CMS I build when I was still a web developer.
The company I worked for needed a new website, and wanted an HTML-editor to edit content on the website. This content was stored in SQL Server, and was retrieved by the website. With a few business rules we decided which content was visible, and which content was hidden from the website.
One of the features of the website was a news feed. But because there were so much news articles, we needed to show the top 10 articles on the first page and let the user click through to the next page of articles. But because we had so much news in the database, we needed to page this data. Every page should show 10 new articles, and we needed to switch pages for the rest of the news articles.
Creating the sample data
In order to show you the problem and solution, we need to create a sample table:
CREATE TABLE dbo.Paging (ID INT IDENTITY(1,1), Title VARCHAR(50), Content VARCHAR(50))
The test data we need to insert looks the same for every record:
INSERT INTO dbo.Paging (Title, Content) VALUES ('This is an article', 'This is the content') GO 50
This script will insert 50 record in the Paging table.
The old way
In older versions of SQL Server you needed to build your own solution to solve this problem. Let’s assume you clicked the news feed button on the website, and we want to switch to page 2 of the results. The solution I build back then looked something like this:
DECLARE @RowsToShow INT = 10, @RowsToSkip INT = 10 SELECT TOP(@RowsToShow) ID , Title , Content FROM dbo.Paging WHERE ID NOT IN (SELECT TOP(@RowsToSkip) ID FROM dbo.Paging) ORDER BY ID ASC
In the query above, you can see we skip the first 10 rows, and retrieve the next 10 rows after that. That means that you need to remember on the website which records were retrieved already, and which records you want to see. The easiest way to do this, is by selecting the ID’s you’ve already shown, and retrieve the next set of rows.
This means you get execution plans like this:
The new way
From SQL Server 2012 onwards, we can use a new feature that is called OFFSET. That feature allows us to “window” our dataset, and retrieve a small subset of data without using a TOP and subquery, like in the example above. The new query would look like this:
SELECT ID , Title , Content FROM dbo.Paging ORDER BY ID ASC OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY
In the query above, you can see an offset of 10 rows, and a fetch of 10 rows. This means that it skips the first 10 records, and retrieves the next 10 records after that. But how can you get this to work with dynamic resultsets and pages? This is one way to do it:
DECLARE @RowsPerPage INT = 10, @PageNumber INT = 2 SELECT ID , Title , Content FROM dbo.Paging ORDER BY ID ASC OFFSET ((@PageNumber - 1) * @RowsPerPage) ROWS FETCH NEXT @RowsPerPage ROWS ONLY
The offset is calculated by taking the @PageNumber parameter minus one to retrieve the page we want to see. If we wouldn’t do that, the offset would always skip the first 10 records. Then we multiply that number by the @RowsPerPage parameter, to calculate how many results we want to retrieve.
After that, we use the @RowsPerPage in the FETCH NEXT clause to retrieve the number of rows we want (in this case the next 10). This results in a completely different execution plan:
As you can see, this has a lot less impact on SQL Server. This becomes really visible if you compare both in SSMS:
Comparing query costs is nice when you quickly compare 2 different approaches for the same solution, but in the end it all comes down to I/O costs. So which query is the fastest solution, and how are we going to test that?
First of all, we need to measure the I/O calls. We do that by using a DBCC command called DROPCLEANBUFFERS. This allows us “to test queries with a cold buffer cache without shutting down and restarting the server”. SO DON’T USE THIS IN PRODUCTION!!!
So the complete testscript looks like this:
SET NOCOUNT ON SET STATISTICS IO ON DBCC DROPCLEANBUFFERS --================================================== DECLARE @RowsToShow INT = 10, @RowsToSkip INT = 10 SELECT TOP(@RowsToShow) ID , Title , Content FROM dbo.Paging WHERE ID NOT IN (SELECT TOP(@RowsToSkip) ID FROM dbo.Paging) ORDER BY ID ASC --================================================== DBCC DROPCLEANBUFFERS --================================================== DECLARE @RowsPerPage INT = 10, @PageNumber INT = 2 SELECT ID , Title , Content FROM dbo.Paging ORDER BY ID ASC OFFSET ((@PageNumber - 1) * @RowsPerPage) ROWS FETCH NEXT @RowsPerPage ROWS ONLY --==================================================
So we clean the SQL Server buffers, run the first query, clean the buffers again, and run the second query. Now the effect of the new statement is really obvious, if you look at I/O costs:
So the old version of the query (with the sub-select) scans the table twice, and reads 51 pages from the cache. The new approach (with the OFFSET) scans the table only once, and reads only 1 page from the cache.
The less I/O calls SQL Server needs to retrieve the result from disk or cache, the faster your query will run. In this case, we’ve tuned the query from 51 pages read to 1 page read. And we’ve only tested this on a table with 50 records, but the network traffic is decreased significantly. So here’s a lot of performance improvement from only one new piece of functionality. And there is a lot more out there.