Replication: Snapshot Agent fails on date conversion

This article was recently published on dev.getroadmap.com:

 
 

In the previous post I wrote, I explained how to setup replication from an on-premise SQL Server instance to an Azure SQL database. While doing this, I came across a very strange issue (or maybe even bug) when setting up replication.

 
The problem child
After working on reproducing the issue for a day, and trying to reduce the issue to a small-scale problem, I came to the conclusion that the problem was (probably) caused by a single primary key on a table in the database:

CREATE TABLE dbo.BuggedTable
	([Day] DATE NOT NULL,
	 SomeId VARCHAR(50) NOT NULL,
	 Amount INT NOT NULL,
	 CONSTRAINT PK_BuggedTable PRIMARY KEY CLUSTERED 
		([Day] ASC,
		SomeId ASC
		)
	)
GO

 
Creating publication & subscriber
The setup of the publication and subscriber wasn’t that difficult. As I said before, there are a few things you need to configure differently then you would do for SQL Server to SQL Server replication.

So I won’t talk you through the whole process again, but refer you to the articles instead.

 
Generate initial snapshot
Once the publication and subscription are in place, it’s time to generate the initial snapshot. The snapshot agent prepares the snapshot that contains the schema and data, needed to initialize the subscriber(s):

 
In some cases it takes a while, but in the end, I found this “warning” on my screen:

 
When you dig into this by opening the agents tab in the replication monitor, you see the actual error:

 
The complete error states:

Error messages:
Message: Query for data failed
Stack: at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.ThrowNativeBcpOutException(CConnection* pNativeConnectionWrapper)
at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.BcpOut(String strBcpObjectName, String strBcpObjectOwner, String strBaseBcpObjectName, Boolean fUnicodeConversion, String strDataFile, String strLoadOrderingHint, String strWhereClause, Boolean useTableLockHint, Int32 bcpFileFormatVersion)
at Microsoft.SqlServer.Replication.Snapshot.SqlServer.BcpOutThreadProvider.DoWork(WorkItem workItem)
at Microsoft.SqlServer.Replication.WorkerThread.NonExceptionBasedAgentThreadProc()
at Microsoft.SqlServer.Replication.AgentCore.BaseAgentThread.AgentThreadProcWrapper() (Source: MSSQLServer, Error number: 0)
Get help: http://help/0
Message: Conversion failed when converting date and/or time from character string.
Stack: (Source: MSSQLServer, Error number: 241)
Get help: http://help/241

 
First I thought that the DATE column being part of the primary key was the problem. But then again, another table in a different database had the combination of a DATE and a UNIQUEIDENTIFIER as a primary key. So that couldn’t be it.

 
Digging deeper
During my investigation to this issue, I found the following:

– There was enough space on the disk for the snapshot
– Other database with DATE in primary key (PK) worked (even though combination was DATE + UNIQUEIDENTIFIER, instead of DATE + VARCHAR)
– Adding “-UseInprocLoader” to Snapshot Agent job doesn’t work

So after digging a bit deeper, and trying to reduce the issue to a small-scale problem, I found:

– Reproduced in an empty (new) database, with just the failing table as single object in the database –> STILL FAILS
– Changed object in test DB to use newly created IDENTITY(1,1) as PK, and added Unique Constraint to old PK columns –> WORKS
– Make DATE column part of different PK (together with IDENTITY), and no Unique Constraint –> WORKS
– Changing the PK to just the VARCHAR column –> WORKS
– Changing the PK to just the DATE column –> WORKS
– Setting database on local instance (SQL 2016 Dev) to comp. 110, just like on test –> WORKS

After a lot of different variables in the test-setup, I found out that it’s probably an old bug that wasn’t properly patched when upgrading the SQL Server engine to a newer version. Let me elaborate on that:

– The bug is reproducible on the test server, which is an upgraded engine from SQL 2012 or 2014 to SQL 2016 RTM
– The bug is reproducible on the production server, which is an upgraded engine from SQL 2014 to SQL 2016 RTM
– The bug is not reproducible on a clean install of SQL 2014
– The bug is not reproducible on a clean install of SQL 2016 RTM
– The bug is not reproducible on a clean install of SQL vNext CTP

 
Finding a work-around
Because I couldn’t find a work-around for this issue, I requested the help from Justing Langford (Blog | @JustinLangford) from Coeo. He pointed me to an article that describes a few possible work-arounds.

For me the row filter did the trick:

 
As described in the article mentioned above, adding the row filter disables the BCP partioning for this article (table), and the snapshot agent completed without any problem:

 
Conclusion
Although this bug should’ve been fixed ages ago, it looks like it (re)appeared again after an engine upgrade. I’m not sure how this happened, but all I know is that it took me about 3 days to find, reproduce, reduce and work around the problem.

So hopefully this article will save you that time!

Setting up replication from on-premise SQL Server to Azure SQL DB

This article was recently published on dev.getroadmap.com:

 

For performance reasons we are looking for a way to split our write-operations from our read operations. A good way to do that is by duplicating the database onto another server (on-premise) or platform (Azure SQL DB). When you look at the options (Availability Groups (AG), database mirroring, clustering, replication, Azure sync groups, etc.) the easiest way to quickly duplicate a database and keep the data up-to-date at the same time is SQL Server replication.

Replication to another on-premise instance is easy. You just follow the steps in the wizard, it works out-of-the-box, and the chances of this process failing are small. With replicating data to an Azure SQL database it’s a bit more of a struggle. Just one single word took me a few HOURS of investigation and a lot of swearing…

 
Setting up replication and databases
Before you start with setting the whole thing up, make sure that the replication is installed on your instance. If this is not done yet, install the needed components via the installer. This component can be installed without rebooting the server, or the need to restart the instance.

If you needed to install replication, make sure you restart the SQL Server Agent. For some reason, the configuration change works on the instance without any problems, but the Agent can’t run the needed jobs until you restart it.

The next step is creating 2 databases: a publisher (on the on-premise instance) and a subscriber (an Azure SQL database):

 
And create a table that can be replicated:

CREATE TABLE dbo.SomeObject
	(ID INT IDENTITY(1,1),
	 SomeValue VARCHAR(100),
	 PRIMARY KEY (ID))

 
Setup the publisher
Now that you have 2 databases, it’s time to setup the publisher. I’m not guiding you through the whole replication setup (there are plenty of tutorials out there that are WAY better than my explanation), but I’ll show you the highlights.

First, start by clicking “New Publication” on your on-premise instance:

 
I chose to run the distributor on the same server.

If you follow the wizard, it will prompt you with the question on where to store the snapshots:

 
Copy this link into for example notepad, because it comes in handy later in the process.

As publication type, choose “Transactional publication”:

 
Again, follow the wizard and choose the objects you want to replicate (in my case just that single table):

 
For the security settings, I chose to use the SQL Server Agent

After finishing the steps in the wizard, give your publication a name, and finish this setup.

 
Create a subscriber
On the on-premise instance, right-click the newly created publication, and add a new subscriber:

 
The distribution agent will remain on the distributor (push subscription).

When you’re prompted to add a subscriber, choose “Add SQL Server Subscriber”:

 
Here comes the part that took me a while to figure out. When you try to replicate changes to an Azure SQL database, and you use the normal connection, it will fail to connect in the end. But if you use a secure connection, replication works like a charm. So make sure you use the following server name ([Server Name].database.secure.windows.net):

 
Adding that single word took me an hour or so to figure out. Hopefully this will save you a few hours of investigation, looking for dodgy error messages, ending up in finding an “errorcode 40852”, and leading you to the solution.

When this is set up, select the target database (subscriber):

 
When you’re prompted for the distribution agent security information, make sure you use a SQL Server login that has the correct permissions on the Azure SQL database:

 
Start the replication
Now that everything is in place and configured, it’s time to create the initial snapshot. This snapshot will be used to create all the objects and data that are supposed to be replicated on the Azure SQL database. Open the snapshot agent by clicking “View Snapshot Agent Status” on the publication:

 
The snapshot agent hasn’t run yet, so press “Start” to start the process:

 
When this process has completed, you should see something similar like this:

 
Now it’s time for a final test. Insert a record on the source database (publisher):

INSERT INTO dbo.SomeObject
	(SomeValue)
VALUES
	('Test value')

 
You should see this record pop-up on the subscriber after a few seconds (default latency is 5 seconds):

SELECT *
FROM dbo.SomeObject

 
You can also check the “Synchronization Status”, and you should see a command being replicated:

 
Conclusion
Out of all the available options, I still think replication is the easiest way to replicate data into Azure. The only downside is that there doesn’t seem to be a lot of information about this, especially when you look at the “database.SECURE.windows.net” part.

If you need more than just the one-way traffic, or you want a bit more functionality, maybe replication isn’t your weapon of choice. But for me, it worked like a charm… Eventually…

Running maintenance on Azure SQL databases

This article was recently published on dev.getroadmap.com:

 

To keep your data healthy and topfit, we all know you need to run something called database maintenance on your databases. This prevents things like performance problems or unrecoverable data from happening, and that is sort of the core business of DBA’s. And when we look at how this can be performed on a regular basis, an on-premise instance is a bit easier to maintain than an Azure SQL database. That’s why I want to share our experience with you on that, hoping that it can save you some time when you start implementing this.

 
Platform setup
In order to get a better understanding of why certain choices were made, I want to share a very basic version of the setup of our environment.

Our platform consists of 2 mayor parts: a private cloud part (that we see as “on-premise”), and an Azure part. Those 2 parts combined make our platform, and this platform hosts all of our services, applications, databases, etc. The development approach we use is everything should be designed with cloud-first in mind, but only if it’s the right tool for the job, and with the exclusion of some data.

The databases we use in Azure are all what Microsoft calls “Azure SQL databases”. This means that there are no virtual machines of any kind are running on our Azure-part of the platform, also known as DBaas (Database as a Service).

When I draw this platform, and only focus on the data part, it will look something like this:

 
One of the advantages of this setup is that we can leverage the power of both parts of the platform. As you’ll see later on in this blog, there are some things that we (need to) run from the on-premise instances and some things fully on Azure.

 
Big shoutout to Ola
Before I’m going into detail, I want to give full kudos to Ola Hallengren (Website | @olahallengren). He has spend a lot of his time to build a SQL Server Maintenance Solution that is completely free for everyone to use. And he did such an excellent job a lot companies (also huge companies) use his solution to run maintenance tasks on their databases.

None of the scripts below are written by me, but only small changes are made in order to make things more clear when the solution is deployed to an environment. The original scripts can be downloaded via the download page on Ola’s website.

 
Backups & Integrity check
Taking backups of your database and making sure there is no corruption in the datafiles is an essential part of the maintenance solution written by Ola. But because Azure SQL databases have a build-in maintenance solution (for example backups: full backups weekly, differentials hourly, and transaction log every 5 minutes, source), we don’t need to worry about that ourselves.

 
Index maintenance & Update Statistics
Indexes and statistics are the core of your performance-based maintenance processes. These make sure your queries run fast, and should provide you with a stable and predictable performance. This is especially needed on an Azure database, since you can’t monitor it like you would with an on-premise database.

Because Azure SQL databases are run on shared hardware that you can’t monitor yourself, Microsoft provides us with a number of different performance counters that we can use to check the status/health of our databases. The most important counters are CPU usage, Data IO, Log IO and DTU usage (a combination of the previously mentioned counters). The DTU counter is the most abstract (to me at least), because it’s explained by Microsoft as:

 

The Database Transaction Unit (DTU) is the unit of measure in SQL Database that represents the relative power of databases based on a real-world measure: the database transaction. We took a set of operations that are typical for an online transaction processing (OLTP) request, and then measured how many transactions could be completed per second under fully loaded conditions.

For example, a Premium P11 database with 1750 DTUs provides 350x more DTU compute power than a Basic database with 5 DTUs.

 
And for me, who is was used to monitoring physical hardware, that is a bit of a different approach when digging into performance-related issues. And it’s not that index and statistics maintenance isn’t important when you work on a on-premise database, but it’s a slightly bigger challenge to monitor the direct effects of a index rebuild or statistics update.

But because every Azure SQL database is a contained database, you need to deploy the stored procedures from Ola’s solution to every single database. So to keep it clear for everyone which table and stored procedures belong to the maintenance solution, I’ve changed Ola’s scripts slightly to create all objects in a specific schema named “dba”. So first of all, let’s create the schema:

CREATE SCHEMA dba
GO

 
And then create the used to log all of the maintenance commands and their outcome:

CREATE TABLE dba.CommandLog
	(ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_CommandLog PRIMARY KEY CLUSTERED,
	 DatabaseName sysname NULL,
	 SchemaName sysname NULL,
	 ObjectName sysname NULL,
	 ObjectType CHAR(2) NULL,
	 IndexName sysname NULL,
	 IndexType TINYINT NULL,
	 StatisticsName sysname NULL,
	 PartitionNumber INT NULL,
	 ExtendedInfo XML NULL,
	 Command NVARCHAR(MAX) NOT NULL,
	 CommandType NVARCHAR(60) NOT NULL,
	 StartTime DATETIME NOT NULL,
	 EndTime DATETIME NULL,
	 ErrorNumber INT NULL,
	 ErrorMessage NVARCHAR(MAX) NULL)
GO

 
Now that these are created, you can create the stored procedure that execute the actual index and statistics maintenance. They are too long to post here as code snippet, but you can download the CommandExecute script here, and the IndexOptimze script here.

But because we want to schedule these procedures later on, I decided to create an additional stored procedure in every database, that is deployed alongside the maintenance objects:

CREATE PROCEDURE dba.ExecuteMaintenance
AS


EXECUTE dba.IndexOptimize
 @Databases = '<Insert database name>',
 @FragmentationLow = 'INDEX_REORGANIZE',
 @FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
 @FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
 @FragmentationLevel1 = 5,
 @FragmentationLevel2 = 30,
 @UpdateStatistics = 'ALL',
 @LogToTable = 'Y',
 @Execute = 'Y'

GO

 
This allows us to run maintenance with specific options on the different databases.

Scheduling
After this was all set up, I needed to come up with a way to run these procedures on a schedule. And as you might know, Azure SQL databases don’t have a SQL Server Agent so that’s were our on-premise platform comes in handy. Just for this I created a new virtual machine in our private cloud, and installed SQL Server on that machine to utilize the SQL Server Agent. This server (operations server) runs all of the scheduled operational jobs (including maintenance and some monitoring jobs) on our Azure environment.

But to run a proces from this operations machine on one of our Azure databases I needed to create a linked server first:

EXEC sp_addlinkedserver 
 @server=N'ServerName__DatabaseName', 
 @srvproduct=N'Azure SQL Db',
 @provider=N'SQLNCLI', 
 @datasrc=N'<SERVERNAME>,1433',
 @catalog='DatbaseName';
 GO

EXEC sp_addlinkedsrvlogin 
 @rmtsrvname = 'ServerName__DatabaseName', 
 @useself = 'FALSE', 
 @locallogin=NULL,
 @rmtuser = '<USERNAME>',
 @rmtpassword = '<PASSWORD>'
 GO

 
And all that’s left now is to create a SQL Server Agent job, that executes the “ExecuteMaintenance” stored procedure on the Azure database:

EXEC [<LinkedServerName>].<DatabaseName>.dba.ExecuteMaintenance

 
The reason I choose this approach is that I didn’t want to lose any time on figuring out how the Azure Scheduler or Automation works, at the moment I was implementing the maintenance solution.

But if you want to have a good resource on that, please check out Pieter Vanhove (Blog | @Pieter_Vanhove) blog, because he wrote a great blog post about that.

 
Conclusion
When I look at how this solution is set up, I’m the first one who admits that this isn’t a perfect or ideal solution. But in the end, this gives the rest of the team a clear solution when they need to start or restart the maintenance process: Just log in to the operations server, start the SQL Server Management Studio (SSMS), open the SQL Server Agent jobs, find the job associated with the database they want to run the maintenance on, and that’s it. But for future scalability and maintainability, we might need to implement another solution to do this.