Replication: Snapshot Agent fails on date conversion

This article was recently published on dev.getroadmap.com:

 
 

In the previous post I wrote, I explained how to setup replication from an on-premise SQL Server instance to an Azure SQL database. While doing this, I came across a very strange issue (or maybe even bug) when setting up replication.

 
The problem child
After working on reproducing the issue for a day, and trying to reduce the issue to a small-scale problem, I came to the conclusion that the problem was (probably) caused by a single primary key on a table in the database:

CREATE TABLE dbo.BuggedTable
	([Day] DATE NOT NULL,
	 SomeId VARCHAR(50) NOT NULL,
	 Amount INT NOT NULL,
	 CONSTRAINT PK_BuggedTable PRIMARY KEY CLUSTERED 
		([Day] ASC,
		SomeId ASC
		)
	)
GO

 
Creating publication & subscriber
The setup of the publication and subscriber wasn’t that difficult. As I said before, there are a few things you need to configure differently then you would do for SQL Server to SQL Server replication.

So I won’t talk you through the whole process again, but refer you to the articles instead.

 
Generate initial snapshot
Once the publication and subscription are in place, it’s time to generate the initial snapshot. The snapshot agent prepares the snapshot that contains the schema and data, needed to initialize the subscriber(s):

 
In some cases it takes a while, but in the end, I found this “warning” on my screen:

 
When you dig into this by opening the agents tab in the replication monitor, you see the actual error:

 
The complete error states:

Error messages:
Message: Query for data failed
Stack: at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.ThrowNativeBcpOutException(CConnection* pNativeConnectionWrapper)
at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.BcpOut(String strBcpObjectName, String strBcpObjectOwner, String strBaseBcpObjectName, Boolean fUnicodeConversion, String strDataFile, String strLoadOrderingHint, String strWhereClause, Boolean useTableLockHint, Int32 bcpFileFormatVersion)
at Microsoft.SqlServer.Replication.Snapshot.SqlServer.BcpOutThreadProvider.DoWork(WorkItem workItem)
at Microsoft.SqlServer.Replication.WorkerThread.NonExceptionBasedAgentThreadProc()
at Microsoft.SqlServer.Replication.AgentCore.BaseAgentThread.AgentThreadProcWrapper() (Source: MSSQLServer, Error number: 0)
Get help: http://help/0
Message: Conversion failed when converting date and/or time from character string.
Stack: (Source: MSSQLServer, Error number: 241)
Get help: http://help/241

 
First I thought that the DATE column being part of the primary key was the problem. But then again, another table in a different database had the combination of a DATE and a UNIQUEIDENTIFIER as a primary key. So that couldn’t be it.

 
Digging deeper
During my investigation to this issue, I found the following:

– There was enough space on the disk for the snapshot
– Other database with DATE in primary key (PK) worked (even though combination was DATE + UNIQUEIDENTIFIER, instead of DATE + VARCHAR)
– Adding “-UseInprocLoader” to Snapshot Agent job doesn’t work

So after digging a bit deeper, and trying to reduce the issue to a small-scale problem, I found:

– Reproduced in an empty (new) database, with just the failing table as single object in the database –> STILL FAILS
– Changed object in test DB to use newly created IDENTITY(1,1) as PK, and added Unique Constraint to old PK columns –> WORKS
– Make DATE column part of different PK (together with IDENTITY), and no Unique Constraint –> WORKS
– Changing the PK to just the VARCHAR column –> WORKS
– Changing the PK to just the DATE column –> WORKS
– Setting database on local instance (SQL 2016 Dev) to comp. 110, just like on test –> WORKS

After a lot of different variables in the test-setup, I found out that it’s probably an old bug that wasn’t properly patched when upgrading the SQL Server engine to a newer version. Let me elaborate on that:

– The bug is reproducible on the test server, which is an upgraded engine from SQL 2012 or 2014 to SQL 2016 RTM
– The bug is reproducible on the production server, which is an upgraded engine from SQL 2014 to SQL 2016 RTM
– The bug is not reproducible on a clean install of SQL 2014
– The bug is not reproducible on a clean install of SQL 2016 RTM
– The bug is not reproducible on a clean install of SQL vNext CTP

 
Finding a work-around
Because I couldn’t find a work-around for this issue, I requested the help from Justing Langford (Blog | @JustinLangford) from Coeo. He pointed me to an article that describes a few possible work-arounds.

For me the row filter did the trick:

 
As described in the article mentioned above, adding the row filter disables the BCP partioning for this article (table), and the snapshot agent completed without any problem:

 
Conclusion
Although this bug should’ve been fixed ages ago, it looks like it (re)appeared again after an engine upgrade. I’m not sure how this happened, but all I know is that it took me about 3 days to find, reproduce, reduce and work around the problem.

So hopefully this article will save you that time!

Running maintenance on Azure SQL databases

This article was recently published on dev.getroadmap.com:

 

To keep your data healthy and topfit, we all know you need to run something called database maintenance on your databases. This prevents things like performance problems or unrecoverable data from happening, and that is sort of the core business of DBA’s. And when we look at how this can be performed on a regular basis, an on-premise instance is a bit easier to maintain than an Azure SQL database. That’s why I want to share our experience with you on that, hoping that it can save you some time when you start implementing this.

 
Platform setup
In order to get a better understanding of why certain choices were made, I want to share a very basic version of the setup of our environment.

Our platform consists of 2 mayor parts: a private cloud part (that we see as “on-premise”), and an Azure part. Those 2 parts combined make our platform, and this platform hosts all of our services, applications, databases, etc. The development approach we use is everything should be designed with cloud-first in mind, but only if it’s the right tool for the job, and with the exclusion of some data.

The databases we use in Azure are all what Microsoft calls “Azure SQL databases”. This means that there are no virtual machines of any kind are running on our Azure-part of the platform, also known as DBaas (Database as a Service).

When I draw this platform, and only focus on the data part, it will look something like this:

 
One of the advantages of this setup is that we can leverage the power of both parts of the platform. As you’ll see later on in this blog, there are some things that we (need to) run from the on-premise instances and some things fully on Azure.

 
Big shoutout to Ola
Before I’m going into detail, I want to give full kudos to Ola Hallengren (Website | @olahallengren). He has spend a lot of his time to build a SQL Server Maintenance Solution that is completely free for everyone to use. And he did such an excellent job a lot companies (also huge companies) use his solution to run maintenance tasks on their databases.

None of the scripts below are written by me, but only small changes are made in order to make things more clear when the solution is deployed to an environment. The original scripts can be downloaded via the download page on Ola’s website.

 
Backups & Integrity check
Taking backups of your database and making sure there is no corruption in the datafiles is an essential part of the maintenance solution written by Ola. But because Azure SQL databases have a build-in maintenance solution (for example backups: full backups weekly, differentials hourly, and transaction log every 5 minutes, source), we don’t need to worry about that ourselves.

 
Index maintenance & Update Statistics
Indexes and statistics are the core of your performance-based maintenance processes. These make sure your queries run fast, and should provide you with a stable and predictable performance. This is especially needed on an Azure database, since you can’t monitor it like you would with an on-premise database.

Because Azure SQL databases are run on shared hardware that you can’t monitor yourself, Microsoft provides us with a number of different performance counters that we can use to check the status/health of our databases. The most important counters are CPU usage, Data IO, Log IO and DTU usage (a combination of the previously mentioned counters). The DTU counter is the most abstract (to me at least), because it’s explained by Microsoft as:

 

The Database Transaction Unit (DTU) is the unit of measure in SQL Database that represents the relative power of databases based on a real-world measure: the database transaction. We took a set of operations that are typical for an online transaction processing (OLTP) request, and then measured how many transactions could be completed per second under fully loaded conditions.

For example, a Premium P11 database with 1750 DTUs provides 350x more DTU compute power than a Basic database with 5 DTUs.

 
And for me, who is was used to monitoring physical hardware, that is a bit of a different approach when digging into performance-related issues. And it’s not that index and statistics maintenance isn’t important when you work on a on-premise database, but it’s a slightly bigger challenge to monitor the direct effects of a index rebuild or statistics update.

But because every Azure SQL database is a contained database, you need to deploy the stored procedures from Ola’s solution to every single database. So to keep it clear for everyone which table and stored procedures belong to the maintenance solution, I’ve changed Ola’s scripts slightly to create all objects in a specific schema named “dba”. So first of all, let’s create the schema:

CREATE SCHEMA dba
GO

 
And then create the used to log all of the maintenance commands and their outcome:

CREATE TABLE dba.CommandLog
	(ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_CommandLog PRIMARY KEY CLUSTERED,
	 DatabaseName sysname NULL,
	 SchemaName sysname NULL,
	 ObjectName sysname NULL,
	 ObjectType CHAR(2) NULL,
	 IndexName sysname NULL,
	 IndexType TINYINT NULL,
	 StatisticsName sysname NULL,
	 PartitionNumber INT NULL,
	 ExtendedInfo XML NULL,
	 Command NVARCHAR(MAX) NOT NULL,
	 CommandType NVARCHAR(60) NOT NULL,
	 StartTime DATETIME NOT NULL,
	 EndTime DATETIME NULL,
	 ErrorNumber INT NULL,
	 ErrorMessage NVARCHAR(MAX) NULL)
GO

 
Now that these are created, you can create the stored procedure that execute the actual index and statistics maintenance. They are too long to post here as code snippet, but you can download the CommandExecute script here, and the IndexOptimze script here.

But because we want to schedule these procedures later on, I decided to create an additional stored procedure in every database, that is deployed alongside the maintenance objects:

CREATE PROCEDURE dba.ExecuteMaintenance
AS


EXECUTE dba.IndexOptimize
 @Databases = '<Insert database name>',
 @FragmentationLow = 'INDEX_REORGANIZE',
 @FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
 @FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
 @FragmentationLevel1 = 5,
 @FragmentationLevel2 = 30,
 @UpdateStatistics = 'ALL',
 @LogToTable = 'Y',
 @Execute = 'Y'

GO

 
This allows us to run maintenance with specific options on the different databases.

Scheduling
After this was all set up, I needed to come up with a way to run these procedures on a schedule. And as you might know, Azure SQL databases don’t have a SQL Server Agent so that’s were our on-premise platform comes in handy. Just for this I created a new virtual machine in our private cloud, and installed SQL Server on that machine to utilize the SQL Server Agent. This server (operations server) runs all of the scheduled operational jobs (including maintenance and some monitoring jobs) on our Azure environment.

But to run a proces from this operations machine on one of our Azure databases I needed to create a linked server first:

EXEC sp_addlinkedserver 
 @server=N'ServerName__DatabaseName', 
 @srvproduct=N'Azure SQL Db',
 @provider=N'SQLNCLI', 
 @datasrc=N'<SERVERNAME>,1433',
 @catalog='DatbaseName';
 GO

EXEC sp_addlinkedsrvlogin 
 @rmtsrvname = 'ServerName__DatabaseName', 
 @useself = 'FALSE', 
 @locallogin=NULL,
 @rmtuser = '<USERNAME>',
 @rmtpassword = '<PASSWORD>'
 GO

 
And all that’s left now is to create a SQL Server Agent job, that executes the “ExecuteMaintenance” stored procedure on the Azure database:

EXEC [<LinkedServerName>].<DatabaseName>.dba.ExecuteMaintenance

 
The reason I choose this approach is that I didn’t want to lose any time on figuring out how the Azure Scheduler or Automation works, at the moment I was implementing the maintenance solution.

But if you want to have a good resource on that, please check out Pieter Vanhove (Blog | @Pieter_Vanhove) blog, because he wrote a great blog post about that.

 
Conclusion
When I look at how this solution is set up, I’m the first one who admits that this isn’t a perfect or ideal solution. But in the end, this gives the rest of the team a clear solution when they need to start or restart the maintenance process: Just log in to the operations server, start the SQL Server Management Studio (SSMS), open the SQL Server Agent jobs, find the job associated with the database they want to run the maintenance on, and that’s it. But for future scalability and maintainability, we might need to implement another solution to do this.

Searching through the SQL Server error logs

SQL Server has a number of error logs, where both informational messages and errors are logged. You can compare it to the event viewer in Windows, but than only for SQL Server. This error log contains a lot of potentially useful information when you’re investigating an issue.

The physical location of the logfiles is “C:\Program Files\Microsoft SQL Server\MSSQL12.[InstanceName]\MSSQL\Log”. This can also be different if you changed the path in the SQL Server setup (for example, the path on my machine is: “C:\Program Files\Microsoft SQL Server 2014\MSSQL12.[InstanceName]\MSSQL\Log”). In that directory you’ll find a number of ERRORLOG.[Number] files. There is a file for every archive, which depends on your SQL Server configuration. You can open the files with notepad, or any other text-editor you like.

But you can also access these archives from SQL Server Management Studio (SSMS).

 
SQL Server Logs
If you connect the object explorer in your SSMS, you can navigate to Management -> SQL Server Logs. There you see the number of configured log files (default is 7 log files: 6 archives + current log file):

 
You can double-click a log file to open it. In the pop-up window you see the contents of the log, a number of checkboxes on the left to add more archives to the current view, and a button called “Filter…” that you can use to filter the current view:

 
But unfortunately the filter in the Log File Viewer isn’t always as easy to use. For example, you can’t (at least as far as I know) filter on 2 strings. So how are you going to look for any events that contains “Backup” or “Restore”? That’s not possible with this filter.

 
xp_readerrorlog
If you would rather use T-SQL to find things in the SQL Server Error Log, that’s also possible. There’s an extended procedure called xp_readerrorlog you can use for that, or you can use sp_readerrorlog (which is a stored procedure that used xp_readerrorlog).

If you use sp_readerrorlog or xp_readerrorlog, you can also apply filters. You can use a number of parameters to filter the output, but you can only do so on 1 singe log file:

EXEC xp_readerrorlog
	0,						--ArchiveID (First error log = 0)
	1,						--Log type (SQL Server = 1, SQL Agent = 2)
	N'Backup',				--Filter ProcessInfo
	N'Sandbox',				--Filter Text
	'20150826 00:00:00',	--DateFrom
	'20150828 10:32:00',	--DateTill
	'ASC'					--Sort order

 
By altering the filter-parameters (or leave them empty), you can search through the SQL Server error log or the SQL Server Agent error log, and you can do so on a number of columns and variables (like datetimes). But you still have the same issue you had with the log viewer in SSMS: you can can only search on 1 filter, and you can’t search through more than 1 archive at the same time. So how can we solve that?

 
Search trough multiple archives
Searching through multiple log files might come in handy if you’re not sure if the database was taken offline today or last week. Or if you’re not sure if the problem occurred before or after a log file cycle.

One way of doing this (which I use on a regular basis, and I blogged about earlier), is by inserting all the log information in a temporary table and search through that results:

DECLARE @SSEL TABLE
	(LogDate DATETIME,
	 ProcessInfo VARCHAR(100),
	 Text VARCHAR(MAX))

/* Insert current log */
INSERT INTO @SSEL
EXEC xp_readerrorlog 0

/* Insert previous log */
--INSERT INTO @XREL
--EXEC xp_readerrorlog 1

SELECT
	SSEL.LogDate,
	SSEL.ProcessInfo,
	SSEL.Text
FROM @SSEL AS SSEL
WHERE SSEL.Text LIKE '%Backup%'
ORDER BY SSEL.LogDate ASC

 
The only issue with this is that you need to find out how many log files you need to insert into the temporary table (if you want to search through all available logs). But you can also find the answer to that question with a query.

 
Number of configured logs
You can configure the amount of error logs from SSMS. If you right-click on the SQL Server Logs in the object explorer, you can click on Configure:

 
In the pop-up window, you can configure the amount of archives:

 
If you script out that change, you’ll end up with this script:

USE master
GO

EXEC xp_instance_regwrite
	N'HKEY_LOCAL_MACHINE',
	N'Software\Microsoft\MSSQLServer\MSSQLServer',
	N'NumErrorLogs',
	REG_DWORD,
	10
GO

 
So you can use the same method (but in reverse of course) to find the amount of configured logs. You just need to use the xp_instance_regread (blogged about this before) stored procedure:

DECLARE @NumErrorLogs int

EXEC master.dbo.xp_instance_regread
	'HKEY_LOCAL_MACHINE',
	'SOFTWARE\Microsoft\MSSQLServer\MSSQLServer',
	N'NumErrorLogs',
	@NumErrorLogs OUTPUT

SELECT @NumErrorLogs AS [NumberOfLogFiles]

 
There are 2 results possible from this:

– NULL: This means the setting is not configured, and the default (6 archives) is used
– An integer between 6 and 99: This means the amount of logs is configured and not set to default. The number that is returned is the amount of archives

 
This result can be used to configure the amount of inserts in the script posted above. If a NULL is returned, you know you need to use the default setting of 6. If an integer is returned, you can use that number to determine the amount of archives you’d like to use.

 
Conclusion
By retrieving the SQL Server error log with a T-SQL query, it’s easy to automate this process if needed. For example, you might want to store the error log data in another place, and not keep it stored in the log files on your instance. You can do that by modifying this script, or write your own solution. By using the xp_instance_regread and xp_readerrorlog or sp_readerrorlog, you can be sure your script doesn’t fail or misses some data.

Time traveling with SQL Server 2016: Temporal tables

Since the release of the first publicly available SQL Server 2016 CTP2, we have a whole list of new and cool features we can start using. One of those features is called temporal tables (or system-versioned tables). So what are temporal tables, and how can you use them?

 
What is a temporal table?
To understand the functionality of temporal tables, we first need to agree on the definition of a table in SQL Server. If you run a query on a table that contains data, that query can only return the current version of a record. This means you can only see the current “truth”, and there’s no way to travel back in time to see older versions of this record.

Temporal tables give you the opportunity to time travel in SQL Server. For every data change (Update, Delete and Merge) in your table, a historical copy is stored in a history table. The table on which you enabled this data tracking is changed to a “System-Versioned” table.

 
How does it work?
For every update or delete on a table where system_versioning is enabled, a historical copy of the record is stored:

The original image can be found in this blog post, written by Manoj Pandey (Blog | @manub22).

 
But how does that work? Let’s start by creating a table we will use to test system-versioned tables:

CREATE TABLE dbo.BankAccount
	(AccountNumber INT CONSTRAINT PK_BankAccount_TransactionID PRIMARY KEY,
	 AccountBalance FLOAT);

 
In order to make a table a system-versioned table, we need to add 2 datetime2 columns to our newly created table. SQL Server will use these columns to store a from- and to-date that will indicate the valid period of time of the record. By telling SQL Server to use these 2 columns with the keywords “PERIOD FOR SYSTEM_TIME”, SQL Server knows that it needs to use these columns when querying data from the history table:

ALTER TABLE dbo.BankAccount
ADD ValidFrom datetime2 GENERATED ALWAYS AS ROW START HIDDEN NOT NULL,
	ValidTo datetime2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL,
	PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo);

 

In the above script, the “HIDDEN” keyword is used. This can only be used from CTP 2.1 and higher. What HIDDEN does, is hide the columns in your original table to make sure application- and database code isn’t going to fail because you added these 2 new columns. You can still query the columns, but if you run a “SELECT *” query these columns won’t be returned.

 
When that’s done we can enable system-versioning on the table:

ALTER TABLE dbo.BankAccount
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory));

 
Data changes & history
Now we can take a look at how data changes are captured, and history is recorded. Before you execute any of the statements below, enable the execution plan in your SSMS. If we insert a new record, no history is created:

INSERT INTO dbo.BankAccount
	(AccountNumber, AccountBalance)
VALUES
	(2147483647, 10000)

 
We just see the expected Clustered Index Insert, nothing special:

 
Now, let’s update the bank account balance (the WAITFOR is added just to make sure we have a clear difference in change time for the next few steps):

UPDATE dbo.BankAccount
SET AccountBalance = AccountBalance - 2500

WAITFOR DELAY '00:01:00'

UPDATE dbo.BankAccount
SET AccountBalance = AccountBalance + 1000

 
If you look at your execution plan, you’ll see an extra DML command in the execution plan:

 
Besides the expected Clustered Index Update, there is an extra Clustered Index Insert. This is the insert statement that is generated by SQL Server, to insert a record in the history table. There’s no user interaction needed to store historical data. SQL Server will take care of that process for you.

 
Time travelling
Now that we store historical data, it’s time to start the actual time travelling. To paint a complete picture, this is a timeline of the data inserts and updates:

 
Now that we have a clear timeline, let’s start our time travel. Besides creating and maintaining the history table, you also get to use a bit of new syntax in your query. For system-versioned tables, you can use the “FOR SYSTEM_TIME” clause in your query. In order to show you the timeline in SQL Server, let’s use this new syntax to query our history per minute:

SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:42:00' --No data available
SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:43:00' --First insert
SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:44:00' --First update
SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:45:00' --Second update
SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:46:00' --No other changes

 
Remember, this is based on my data. So if you want to query your timeline, change the datetime string based on your data!

So basically, you can start querying your data by using a specific point in time:

DECLARE @PointInHistory DATETIME2
SET @PointInHistory = '2015-07-09 09:45:00'

SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF @PointInHistory

 
Besides the actual point in time, you can use 3 other ways to query your data:

SELECT AccountNumber, AccountBalance, ValidFrom, ValidTo
FROM dbo.BankAccount
FOR SYSTEM_TIME FROM '2015-07-09 09:43:00' TO '2015-07-09 09:45:00'


SELECT AccountNumber, AccountBalance, ValidFrom, ValidTo
FROM dbo.BankAccount
FOR SYSTEM_TIME BETWEEN '2015-07-09 09:43:00' AND '2015-07-09 09:45:00'


SELECT AccountNumber, AccountBalance, ValidFrom, ValidTo
FROM dbo.BankAccount
FOR SYSTEM_TIME CONTAINED IN ('2015-07-09 09:43:00', '2015-07-09 09:45:00')

 
Table changes
In practice you’ll not only see data changes happening on tables, but you’ll also have schema changes on your table once in a while. But how does a temporal table cope with schema changes? Let’s try to add a column to our table:

ALTER TABLE dbo.BankAccount ADD NewColumn VARCHAR(10)

This results in an error:

 
In order to add a column, we need to disable the system-versioning first, before we add the new column:

ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = OFF)
GO

ALTER TABLE dbo.BankAccount ADD NewColumn VARCHAR(10)
GO

 
Now let’s enable system-versioning again:

ALTER TABLE dbo.BankAccount
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory))
GO

 
This results in another error:

 
So we first need to add the same column to the history table, before enabling system-versioning again:

ALTER TABLE dbo.MyBankAccountHistory
ADD NewColumn VARCHAR(10)
GO

ALTER TABLE dbo.BankAccount
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory))
GO

 

Disabling system-versioning means that for a short period of time, you could loose historical data. A possible way to get around this, is by setting your database in single user mode before you disable it. This prevents other user settings from changing data in that table. If you run this in your production environment, this might not be your best option though! By disabling system-versioning, historical data will not be removed.

 
Indexing historical data
Performance isn’t only important when you query your “normal” tables, but also if you want to query your newly created temporal tables. If you want to write a business process that uses your historical data, you need to make sure this doesn’t slow down your application. But how do you achieve that?

Just like on a normal table, you can create indexes no the history table. This table is basically a normal table, so there’s no reason why you can’t create indexes on it:

CREATE NONCLUSTERED INDEX IDX_MyHistoricalData
ON dbo.MyBankAccountHistory (AccountNumber, AccountBalance)

 
Historical data cleanup
It’s really great to store historical versions of your records in a separate table, but this also provides you with new challenges. You need more storage for your data, and you might want to remove data after a certain period of time. At this point in time, system-versioned tables don’t have a cleanup process. This might change in the future, but right now you need to cleanup these tables yourself. So if you want (or need) to cleanup old data, you need to write a small process yourself. This can be as easy as writing a query to run through all system-versioned tabled and delete old data, and schedule that in a SQL Server Agent job. But again, this is something to keep in mind when you start working with temporal tables.

I had a really good conversation about this with Borko Novakovic (@borko_novakovic), who is the Program Manager for this part of SQL Server. The code below is an example of such a process, that Borko provided:

BEGIN TRAN 

    /*Drop versioning to allow deletion of historical data*/
       ALTER TABLE dbo.BankAccount
              SET (SYSTEM_VERSIONING = OFF)
 
       /*DELETE MyBankAccountHistory table*/
       DELETE
       FROM dbo.MyBankAccountHistory
       WHERE ValidTo < '2015.01.01'
       
       /*Re-establish versioning*/
       ALTER TABLE dbo.BankAccount
              SET (SYSTEM_VERSIONING = ON
                  (HISTORY_TABLE  = dbo.MyBankAccountHistory,
                   DATA_CONSISTENCY_CHECK = OFF));

COMMIT

 
Deleting data without disabling system-versioning doesn’t work, because you’ll get an error like this:

 
I do want to thank Borko again for his time. He helped me out a lot, and I’m thankful he took the time to do that. Thanks Borko!!!

 
Remove system-versioning
Removing system-versioned tables from your systems takes one extra step, because just executing a “DROP TABLE” statement doesn’t work. SQL Server will prevent you from dropping temporal tables, without you first disabling it. Let’s start by disabling system-versioning:

ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = OFF)
GO

 
This disables the system-versioning on your original table, and transforms the history table to a normal table. If you want to get rid of the tables, you can just drop them both like you would normally do:

DROP TABLE dbo.BankAccount
GO
DROP TABLE dbo.MyBankAccountHistory
GO

 
Resources
Besides this blog, there are 2 fantastic posts you need to read about this topic, written by Itzik Ben-Gan (Blog | @ItzikBenGan), that you can read at SQLMag.com (Part 1, Part 2).

One thing you definitely MUST SEE is a video with Borko and Scott Klein (Blog | @SQLScott) on Channel 9. In this video, Borko explains the idea behind temporal tables, and shares some insights on this new feature.

Other information on temporal tables can be found on MSDN:

Temporal Tables
sys.tables
SQL Server 2016 (MSDN root)

Find SQL Server Agent job history retention with T-SQL

In SQL Server, the Job Agent has its own retention that you can set. This retention determines how much execution history of SQL Server Agent jobs is being kept. But if you have a lot of instances across your domain, or if you reinstall a new server, how do you determine if this value is set correctly?

 
Doing it old-school: via the GUI
You can check the SQL Server agent retention via the GUI, by right-clicking on the SQL Server Agent, and clicking on properties. In the window that popped-up, click on History, and you’ll see something like this:

 
T-SQL / Registry
The way I wanted to check it is with T-SQL. Because I want to run the statement against a 40+ instances, I use a multi-server query for that. The reason I do that, is that this returns all the results in a single table, which makes reviewing and comparing the results a lot easier.

But there doesn’t seem to be a way to query the set values. But with a little bit of searching, I found out that you use the stored procedure below to change one of the values:

EXEC msdb.dbo.sp_set_sqlagent_properties @jobhistory_max_rows = 999

 
So by dissecting this procedure, I found out SQL Server just reads the value from a registry key. I took out the piece that I need to read the retention values:

DECLARE @jobhistory_max_rows INT = NULL,
        @jobhistory_max_rows_per_job INT = NULL


EXECUTE master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE',
                                        N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent',
                                        N'JobHistoryMaxRows',
                                        @jobhistory_max_rows OUTPUT,
                                        N'no_output'
SELECT @jobhistory_max_rows = ISNULL(@jobhistory_max_rows, -1)


EXECUTE master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE',
                                        N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent',
                                        N'JobHistoryMaxRowsPerJob',
                                        @jobhistory_max_rows_per_job OUTPUT,
                                        N'no_output'


SELECT @jobhistory_max_rows, @jobhistory_max_rows_per_job

 
But when I wanted to look up the values in the registry, to see if it returned the correct results, I couldn’t find the actual keys. I thought I was losing my mind, but there’s a catch…

 
xp_instance_regread vs xp_regread
There are 2 stored procedures you can use to read registry keys from your SQL Server:

DECLARE @jobhistory_max_rows INT = NULL

EXECUTE master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE',
                                        N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent',
                                        N'JobHistoryMaxRows',
                                        @jobhistory_max_rows OUTPUT,
                                        N'no_output'

SELECT @jobhistory_max_rows

 
or

DECLARE @jobhistory_max_rows INT = NULL

EXECUTE master.dbo.xp_regread N'HKEY_LOCAL_MACHINE',
                                        N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent',
                                        N'JobHistoryMaxRows',
                                        @jobhistory_max_rows OUTPUT,
                                        N'no_output'

SELECT @jobhistory_max_rows

 
The difference between the two is perfectly explained in this Stack Overflow thread:

xp_regread reads the literal registry path that you specify. xp_instance_regread “converts” the path you specify so that it matches the instance of SQL Server that you’re currently using.

 
So if you run multiple instances on your machine, you want to use the instance version of the stored procedure. If you’re just running 1 (default) instance, you can use both versions.

So eventually, I searched for the instance folder, and I did find the registry keys:

 
Conclusion
Especially for very large environments, it’s a good idea to write system- and/or server-checks in T-SQL. That way, you can script them in SCOM (or another monitoring tool), or if you use SQL Server Policy Based Management (and I would really like to recommend this Pluralsight course from a very good friend of mine: Boris Hristov (Blog | @BorisHristov)!), or just a multi-server query.

Avoid a big SSISDB by logging less…?

Last week I blogged about how logging can grind your SSIS to a halt. After posting the blog, I got a really interesting reaction from Koen Verbeeck (Blog | @Ko_Ver):

I knew there were different logging levels in SSIS, but I couldn’t recall the difference in levels. So I discussed about the difference with Koen. At first glance we thought that the biggest difference is the fact that performance logging only logs warnings and errors. So basically, you decrease the amount of messages logged, which should results in a slightly better performance. But is that really the case?

 
Different logging levels
In SSIS there are 4 different logging levels. So what are the options, and what’s the difference between them? You can read all about it on MSDN, but the short version is:

None: Logging is turned off
Basic (default value): All evens are logged, except custom- and diagnostic events
Performance: Only performance statistics, OnError and OnWarning events are logged
Verbose: All events are logged

 
The next questions is: where to change these options. For example, how do I change my logging level to performance?

 
SSISDB / Catalog
When you want to change the logging level for all your deployed SSIS packages, you could change the setting on your catalog. You can do that by opening the “Integration Services Catalogs” on your instance, right-click on your catalog, and choose an option for “Server-wide Default Logging Level”:

 
If you change this, the logging level for all packages in this catalog will change. But maybe you don’t want that, and you only want it for a specific package.

 
SQL Server Agent Job
Another option is do configure this in your SQL Server Agent job:

 
If you use this option, the logging level will change for the package that is executed in the job step. So no permanent change, but only for the duration of the job.

 
At package execution
Another option is to do it at package runtime:

 
Personally I don’t use that option of executing packages, but you might. The same applies as the change on the SQL Server Agent job, this will only change the option for the duration of the execution, so no permanent change.

 
How to determine your best choice
Making a good decision is difficult in this case. When something goes wrong, you want to log everything. But when things run smoothly, it’s a waste of time to log every event that passes by in SSIS. And how do you determine the level of logging you need, without knowing what data is in your SSISDB? If you want to analyze that, you could use a query like this:

SELECT
  EventCounts.EventName,
  EventCounts.NumberOfEvents,
  CONVERT(FLOAT,(CONVERT(FLOAT,[NumberOfEvents]) / SUM(NumberOfEvents) OVER ()) * 100) AS 'Percentage'
FROM
  (
    SELECT
      EM.event_name AS 'EventName',
      COUNT(*) AS 'NumberOfEvents'
    FROM SSISDB.catalog.event_messages AS EM
    WHERE EM.event_name IS NOT NULL
    GROUP BY EM.event_name
  ) EventCounts
ORDER BY EventCounts.EventName ASC

 
This shows you the type of events stored in your SSISDB, the amount of events, and a percentage over the whole dataset. This can help you determine the logging level you need in your specific case.

 
But here’s the catch…
Performance logging doesn’t actually make your packages run faster… Koen sent me this blog post from Matt Masson (Blog | @mattmasson). In his blog post, he explains what events are logged at the specific levels. And this is what he says about performance level:

The Performance log level should be used when you are doing benchmarking and performance tuning for your packages. While it actually logs less messages to the [catalog].[operation_messages] view than Basic, it captures a lot more events internally to analyze the performance of the data flow components. As a result, there is a bit more overhead during execution – packages run with Basic will actually run a little faster than Performance (in this case Performance means “give me all of the performance details”, not “run my packages as fast as you can”).

 
Conclusion
Even though SSIS is easy to use, there are some pros and cons. The longer I work with SSIS (and that’s not on a daily basis), the more of these pitfalls I discover. And even though they can be fixed pretty fast, it normally takes you time to figure out the problem, because I’m not that familiar with the inner-workings of SSIS. And even when you think you have found a solution for your problem, SSIS just works slightly different than expected. I’m just glad I have friends like Koen to help me out when needed! Thanks again Koen!

SSISDB: Why bigger isn’t always better

Two weeks ago I encountered some strange issues with the SSISDB. Packages and processes started failing, and all we could find is this error message:

 
The image above is an extract of the standard SSMS “All Executions” report. You can find this by right-clicking your SSISDB, click on Reports, then Standard Reports, and choose the “All Executions” report.

While the packages started failing, we couldn’t find a cause. Deploying new packages wasn’t possible either. There were no events logged in the report, in the Windows event viewer, or in the SQL Server error log. So there was no starting point to further investigate this issue.

 
Finding the cause
Checking the size of the SSISDB showed some possible cause of the issue: it was 72GB big! This seemed odd, because we don’t store a lot of packages in the SSISDB. I knew SSISDB contained some logging information, but I didn’t knew how much. Checking the “Disk usage per table” report (another standard report in SSMS), the problem became a bit more obvious:

 

 
So basically there were over 67 million records in the database! This shouldn’t have happened, because the log retention should be set so 14 days. But we quickly found the problem:

 

 
The retention was set to 365 days. This could only mean the the SSISDB was re-created, and this setting wasn’t changed after the deploy. This resulted in a lot of extra logging data in the database. Because of the amount of record, the standard “SSISDB Maintenance job” ran for 40+ minutes, instead of a maximum of 8 minutes (which it normally needs to clean up logging). But setting the retention to 14 days and running the maintenance job would just result in an endless running job. So how could you prevent that?

 
Grabbing the bigger hammer
In most cases, grabbing a bigger hammer to solve the problem isn’t your best option. In this case, I didn’t see another way. After some searching, I found this blog post about truncating the log tables in SSISDB. Below you find the copy of the script. The credits for this script go to Ibrahim Naji (Blog | @thinknook).


CREATE PROCEDURE [internal].[cleanup_server_retention_window_truncateall]
AS  
     
SET NOCOUNT ON
     
DECLARE @enable_clean_operation bit
DECLARE @retention_window_length INT
     
DECLARE @caller_name nvarchar(256)
DECLARE @caller_sid  varbinary(85)
DECLARE @operation_id BIGINT
     
EXECUTE AS CALLER
    SET @caller_name =  SUSER_NAME()
    SET @caller_sid =   SUSER_SID()
REVERT
          
     
BEGIN TRY
    SELECT @enable_clean_operation = CONVERT(bit, property_value) 
        FROM [catalog].[catalog_properties]
        WHERE property_name = 'OPERATION_CLEANUP_ENABLED'
         
    IF @enable_clean_operation = 1
    BEGIN
        SELECT @retention_window_length = CONVERT(INT,property_value)  
            FROM [catalog].[catalog_properties]
            WHERE property_name = 'RETENTION_WINDOW'
                 
        IF @retention_window_length <= 0 
        BEGIN
            RAISERROR(27163    ,16,1,'RETENTION_WINDOW')
        END
             
        INSERT INTO [internal].[operations] (
            [operation_type],  
            [created_time], 
            [object_type],
            [object_id],
            [object_name],
            [STATUS], 
            [start_time],
            [caller_sid], 
            [caller_name]
            )
        VALUES (
            2,
            SYSDATETIMEOFFSET(),
            NULL,                     
            NULL,                     
            NULL,                     
            1,      
            SYSDATETIMEOFFSET(),
            @caller_sid,            
            @caller_name            
            ) 
        SET @operation_id = SCOPE_IDENTITY() 


        -- Remove all [internal].[executions] dependancies
        TRUNCATE TABLE [internal].[executable_statistics]
        TRUNCATE TABLE [internal].[execution_component_phases]
        TRUNCATE TABLE [internal].[execution_data_statistics]
        TRUNCATE TABLE [internal].[execution_data_taps]
        TRUNCATE TABLE [internal].[execution_parameter_values]
        TRUNCATE TABLE [internal].[execution_property_override_values]


        -- Remove all [internal].[event_message_context] dependancies
        TRUNCATE TABLE [internal].[event_message_context]

        -- Remove all non-dependant tables
        TRUNCATE TABLE [internal].[operation_os_sys_info]
        TRUNCATE TABLE [internal].[operation_permissions]
        TRUNCATE TABLE [internal].[validations]
        TRUNCATE TABLE [internal].[extended_operation_info]

        -- Deal with [internal].[event_messages] and [internal].[operation_messages]
        ALTER TABLE [internal].[event_message_context] DROP CONSTRAINT [FK_EventMessageContext_EventMessageId_EventMessages]
         
        TRUNCATE TABLE internal.event_messages
         
        ALTER TABLE [internal].[event_message_context]  WITH CHECK ADD  CONSTRAINT [FK_EventMessageContext_EventMessageId_EventMessages] FOREIGN KEY([event_message_id])
        REFERENCES [internal].[event_messages] ([event_message_id])
        ON DELETE CASCADE

        ALTER TABLE [internal].[event_messages] DROP CONSTRAINT [FK_EventMessages_OperationMessageId_OperationMessage]
         
        TRUNCATE TABLE [internal].[operation_messages]

        ALTER TABLE [internal].[event_messages]  WITH CHECK ADD  CONSTRAINT [FK_EventMessages_OperationMessageId_OperationMessage] FOREIGN KEY([event_message_id])
        REFERENCES [internal].[operation_messages] ([operation_message_id])
        ON DELETE CASCADE

        -- Deal with [internal].[executions]

        ALTER TABLE [internal].[executable_statistics] DROP CONSTRAINT [FK_ExecutableStatistics_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_component_phases] DROP CONSTRAINT [FK_ExecCompPhases_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_data_statistics] DROP CONSTRAINT [FK_ExecDataStat_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_data_taps] DROP CONSTRAINT [FK_ExecDataTaps_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_parameter_values] DROP CONSTRAINT [FK_ExecutionParameterValue_ExecutionId_Executions]
        ALTER TABLE [internal].[execution_property_override_values] DROP CONSTRAINT [FK_ExecutionPropertyOverrideValue_ExecutionId_Executions]

        TRUNCATE TABLE [internal].[executions]

        ALTER TABLE [internal].[execution_property_override_values]  WITH CHECK ADD  CONSTRAINT [FK_ExecutionPropertyOverrideValue_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE

        ALTER TABLE [internal].[execution_parameter_values]  WITH CHECK ADD  CONSTRAINT [FK_ExecutionParameterValue_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE

        ALTER TABLE [internal].[execution_data_taps]  WITH CHECK ADD  CONSTRAINT [FK_ExecDataTaps_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE

        ALTER TABLE [internal].[execution_data_statistics]  WITH CHECK ADD  CONSTRAINT [FK_ExecDataStat_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE
         
        ALTER TABLE [internal].[execution_component_phases]  WITH CHECK ADD  CONSTRAINT [FK_ExecCompPhases_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE
         
        ALTER TABLE [internal].[executable_statistics]  WITH CHECK ADD  CONSTRAINT [FK_ExecutableStatistics_ExecutionId_Executions] FOREIGN KEY([execution_id])
        REFERENCES [internal].[executions] ([execution_id])
        ON DELETE CASCADE
         

        -- Deal with [internal].[operations]
        DECLARE @deleted_ops TABLE(operation_id BIGINT, operation_type SMALLINT)

        DELETE --TOP (@delete_batch_size)
        FROM [internal].[operations] 
        OUTPUT DELETED.operation_id, DELETED.operation_type INTO @deleted_ops
        WHERE operation_id != @operation_id

             
             
        DECLARE @execution_id BIGINT
        DECLARE @sqlString              nvarchar(1024)
        DECLARE @key_name               [internal].[adt_name]
        DECLARE @certificate_name       [internal].[adt_name]
             
             
        DECLARE execution_cursor CURSOR LOCAL FOR 
            SELECT operation_id FROM @deleted_ops 
            WHERE operation_type = 200
             
        OPEN execution_cursor
        FETCH NEXT FROM execution_cursor INTO @execution_id
             
        WHILE @@FETCH_STATUS = 0
        BEGIN
            SET @key_name = 'MS_Enckey_Exec_'+CONVERT(VARCHAR,@execution_id)
            SET @certificate_name = 'MS_Cert_Exec_'+CONVERT(VARCHAR,@execution_id)
            SET @sqlString = 'IF EXISTS (SELECT name FROM sys.symmetric_keys WHERE name = ''' + @key_name +''') '
                +'DROP SYMMETRIC KEY '+ @key_name
                EXECUTE sp_executesql @sqlString
            SET @sqlString = 'IF EXISTS (select name from sys.certificates WHERE name = ''' + @certificate_name +''') '
                +'DROP CERTIFICATE '+ @certificate_name
                EXECUTE sp_executesql @sqlString
            FETCH NEXT FROM execution_cursor INTO @execution_id
        END
        CLOSE execution_cursor
        DEALLOCATE execution_cursor

        END
END TRY
BEGIN CATCH
         
         
    IF (CURSOR_STATUS('local', 'execution_cursor') = 1 
        OR CURSOR_STATUS('local', 'execution_cursor') = 0)
    BEGIN
        CLOSE execution_cursor
        DEALLOCATE execution_cursor            
    END
         
    UPDATE [internal].[operations]
        SET [STATUS] = 4,
        [end_time] = SYSDATETIMEOFFSET()
        WHERE [operation_id] = @operation_id;       
    THROW
END CATCH
     
RETURN 0

 
After running this on the test environment, I found out it worked as I expected. I started the script on production (where the problem was found in the first place), and when it finished I changed the retention from 365 to 14 days:

 

 
After doing that, I ran the “SSISDB Maintenance job” (which ran fine), and an SSIS package as a test (worked fine as well). And looking at the record counts, it worked fine:

 

 
I’m not saying you should run this on your system, but I’m just saying this works fine on my system, and in my specific case. If you find any code on the internet, test it on a non-critical system first, before you run it in production!!!

 
Conclusion
Having error logging is a good thing. It can help you determine problems, and you can see if and how process grow over time. But too much logging for SSIS causes performance issues, prevents you from deploying new packages, and can even cause unexpected termination of SSIS packages. One way to prevent that is by changing the history retention of your SSISDB.