Published CDC article

A few years ago, I made a list of goals I wanted to work towards, and things I wanted to achieve. One of those things was speaking (which I achieved in January 2015). Another point on the list was to write an article for a published magazine. Today, I’m proud to announce I can cross that goal off my list as well!

In December 2014, I saw a tweet from Marcel Meijer (Blog | @MarcelMeijer ), asking who wanted to write an article for SDN Magazine. After thinking about it for a while, I decided to just go for it.

Earlier this week, I saw the magazine was available for download on the SDN website. I’m honored my article was published, and I can only hope the people who read the magazine like it.

If you read the article, I’d love to hear your opinion!

Advertisements

Restoring or Moving a database with CDC enabled

When you have CDC enabled on your database, and you want to move it on another instance or restore it somewhere, you need to take some extra steps to make sure CDC is transferred correctly. When asking myself the question how this works, I decided to do some testing.

 
Create test resources
In order to test this, we need to create some test resources. I’ve created a new database, added a table and enabled CDC:

CREATE DATABASE [CDC_DB]
 ON PRIMARY
(NAME = N'CDC_DB', FILENAME = N'C:\Databases\#Backup\CDC_DB.mdf')
 LOG ON
(NAME = N'CDC_DB_log', FILENAME = N'C:\Databases\#Backup\CDC_DB_log.ldf')
GO

USE CDC_DB
GO

CREATE TABLE dbo.Customer
  (CustomerID INT CONSTRAINT PK_Customer PRIMARY KEY IDENTITY(1,1),
   FirstName VARCHAR(50),
   LastName VARCHAR(50))
GO

EXEC sys.sp_cdc_enable_db
GO

EXEC sys.sp_cdc_enable_table
  @source_schema = 'dbo',
  @source_name = 'Customer',
  @supports_net_changes = 1,
  @role_name = NULL,
  @index_name = 'PK_Customer'
GO


INSERT INTO dbo.Customer
  (FirstName, LastName)
VALUES
  ('John', 'Doe')
GO

Now that we have the resources, we can take a backup of the database:

BACKUP DATABASE CDC_DB TO DISK = 'C:\Databases\#Backup\CDC_DB.bak'

 
Restoring the backup
To show you the default restore doesn’t work, let’s restore the backup next to the original database:

RESTORE DATABASE CDC_DB_RESTORE FROM DISK='C:\Databases\#Backup\CDC_DB.bak'
WITH
   MOVE 'CDC_DB' TO 'C:\Databases\#Backup\CDC_DB.mdf',
   MOVE 'CDC_DB_log' TO 'C:\Databases\#Backup\CDC_DB_log.ldf'

If we now try to select data from the change table of CDC, we get an error message:

 
But CDC was in the old database, so why wasn’t it restored? If you look for a solution, Books Online / MSDN doesn’t say anything about this. But luckily other community members blogged about this before (see this and this article). There is an option you can add to your restore:

USE master
GO

DROP DATABASE CDC_DB_RESTORE
GO

RESTORE DATABASE CDC_DB_RESTORE FROM DISK='C:\Temp\#BackupTest\CDC_DB.bak'
WITH
   MOVE 'CDC_DB' TO 'C:\Temp\#BackupTest\CDC_DB.mdf',
   MOVE 'CDC_DB_log' TO 'C:\Temp\#BackupTest\CDC_DB_log.ldf'
, KEEP_CDC
GO

 
If you add “KEEP_CDC” to your backup statement, it will also restore CDC. But that’s not your only option.

 
Detach and Attach

Another option you have is to detach your database, copy the files to another location, and attach the database again. Let’s detach the database first:

USE master
GO
ALTER DATABASE [CDC_DB_RESTORE] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
EXEC master.dbo.sp_detach_db @dbname = N'CDC_DB_RESTORE'
GO

 
Now that’s done, we can copy the files to another directory if needed. You could also just attach it on the same instance:

USE master
GO
CREATE DATABASE [CDC_DB_RESTORE] ON 
( FILENAME = N'C:\Temp\#BackupTest\CDC_DB.mdf' ),
( FILENAME = N'C:\Temp\#BackupTest\CDC_DB_log.ldf' )
 FOR ATTACH
GO

 
This also allows you to use CDC like nothing happened. The database is still CDC enabled, the change tables are there, and the captured data is still in the change table. But there’s one thing missing when you move your CDC database to another instance: the capture and cleanup jobs.

 
CDC Jobs
So how do we recover the jobs? Script them from one instance, and create them on the other instance? There’s a better way to do that. You can just run the statement below to add the jobs to your instance:

USE CDC_DB_RESTORE
GO
EXEC sys.sp_cdc_add_job 'capture'
GO
EXEC sys.sp_cdc_add_job 'cleanup'
GO

 
And if you move your database, you can also clean your instance, and run the statement below to remove the leftover jobs:

USE CDC_DB_RESTORE
GO
EXEC sys.sp_cdc_drop_job @job_type = N'capture'
GO
EXEC sys.sp_cdc_drop_job @job_type = N'cleanup'
GO

 
It is possible, but…
Moving or restoring a database with CDC gives you a few challenges, and even though there are some issues, it’s not impossible. But before you start, you need to think about the approach you need to take, in order to move/restore a database without losing CDC. Depending on your environment and possibilities, you might want to use a specific option, or perhaps can’t use the option you would like. There is a way around, but I would recommend you to test it first, before you run this on production databases!

Incremental updates with Change Data Capture

When designing a database or ETL process, for example loading your production data into a reporting environment, you always start your design with performance in mind. In the beginning of the project, your scripts and ETL run blazing fast. But after a few months in production, the entire project grinds to a halt. But how do you fix that problem, without a complete redesign of your applications and database? One of the many solutions is an easy one: incrementally load your data into the destination tables.

 
Change Data Capture
Incremental data loading could be a hard nut to crack. It’s not always an option, but it might be a good point to start from. One of the ways to start loading your data incrementally, is by using the keys in your database as a reference. If your table has a column called “Modified Date”, and that is updated every time the record is updated, you could use that. Every night, when the process runs, you just add the records that were modified after the last successful process run. But what if you don’t have that possibility? Change Data Capture (CDC) is an easy way out.

 
CDC is a way to record inserts, updates and deletes on a specific table, without the need of writing the triggers yourself. CDC reads the transaction log, and captures all changes made to the specific table. These changes are stored in the associated change table, that is created by CDC.

Below I’m going to show you how to setup your first table with CDC.If you would like to know more about CDC, this TechNet article is a place to start.

 
Create an example
To show you the basics of CDC, let start with creating a table called TestCDC in the database called Sandbox:

USE Sandbox
GO

CREATE TABLE dbo.TestCDC
  (ID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
	 Descr varchar(50) NULL)
GO

 
Once you’ve created the table, turn on CDC at the database level, by execution the system stored procedure created to do that:

EXEC sys.sp_cdc_enable_db

 
There is also a system stored procedure to enable CDC on the table level. You need to enable CDC on tables manually, and separately for every table you need:

EXEC sys.sp_cdc_enable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @role_name = NULL

 
If the SQL Server Agent is running on your machine or server, you’ll see this confirmation (I’ll explain later why SQL Server Agent is needed):

 
If the Agent isn’t running, you’ll see this warning:

 
If you ran the enable table statement, you will see that SQL Server created the system objects needed to track changes in the table:

 
Because CDC uses 2 SQL Server Agent jobs to capture and cleanup the change tables, you need to run the Agent to start the data capture. If the jobs aren’t running, SQL Server won’t capture any changes made:

 
Start data changes
In order to see what happens when you change data, let’s insert some records:

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description')

INSERT INTO dbo.TestCDC
  (Descr)
VALUES
  ('This is a description too...')

 
And let’s update one of those 2 inserted records:

UPDATE dbo.TestCDC
SET Descr = 'UPD - ' + Descr
WHERE ID = 2

 
Now, let’s check the content of both the original table, and the change table:

/* Original table */
SELECT * FROM dbo.TestCDC

/* Change table */
SELECT * FROM cdc.dbo_TestCDC_CT

 
If you run both queries, you’ll see the resultset below:

 
The records in the CDC change table allow you to update the data in your reporting environment. You could query them yourself, by retrieving all the changes since your last update. You can also use the procedures that return those changes for you, for example the cdc.fn_cdc_get_net_changes_. You can read more about the system function here.

 
Cleaning up after an update
Now that you’ve updated your reporting environment, it’s a wise thing to cleanup your CDC data. You could also drop the records yourself with a DELETE statement. Another option is using the system procedure for that: “sys.sp_cdc_cleanup_change_table”. You can clean your data using the following SQL statement:

DECLARE @Last_LSN VARBINARY(10) =
  (SELECT MAX(cdc.dbo_TestCDC_CT.[__$start_lsn]) FROM cdc.dbo_TestCDC_CT)

EXEC sys.sp_cdc_cleanup_change_table
  @capture_instance = 'dbo_TestCDC',
  @low_water_mark = @Last_LSN,
  @threshold = 5000

 
The query will retrieve the last LSN (Log Sequence Number), and remove everything that happened before that.

 
Cleanup of CDC
If you want to completely remove CDC (because we’re done testing), you can disable it on the table level by running the query below:

EXEC sys.sp_cdc_disable_table
  @source_schema = 'dbo',
  @source_name = 'TestCDC',
  @capture_instance = 'dbo_TestCDC'

 
The statement will cleanup all the objects that were created to enable CDC on that specific table. But the statement will only stop the CDC on the specific table. The fastest way to disable CDC on all tables in the database, is disabling CDC on the database level, by running the query below:

EXEC sys.sp_cdc_disable_db

 
Conclusion
Loading data always takes time, and there are many factors that are important: your database size, your frequency of changes, your ETL process, etc. The time it costs you to move data can be changed by rewriting your process to incremental loads. CDC is one of the many ways to achieve this. It works out of the box, and doesn’t require you to build any process yourself. But maybe your environment needs a custom process to operate the way you want it to. Not every feature in SQL Server is a so called silver bullet, but sometimes it comes darn close to one…