January 17, 2017 3 Comments
In the previous post I wrote, I explained how to setup replication from an on-premise SQL Server instance to an Azure SQL database. While doing this, I came across a very strange issue (or maybe even bug) when setting up replication.
The problem child
After working on reproducing the issue for a day, and trying to reduce the issue to a small-scale problem, I came to the conclusion that the problem was (probably) caused by a single primary key on a table in the database:
CREATE TABLE dbo.BuggedTable ([Day] DATE NOT NULL, SomeId VARCHAR(50) NOT NULL, Amount INT NOT NULL, CONSTRAINT PK_BuggedTable PRIMARY KEY CLUSTERED ([Day] ASC, SomeId ASC ) ) GO
Creating publication & subscriber
The setup of the publication and subscriber wasn’t that difficult. As I said before, there are a few things you need to configure differently then you would do for SQL Server to SQL Server replication.
So I won’t talk you through the whole process again, but refer you to the articles instead.
Generate initial snapshot
Once the publication and subscription are in place, it’s time to generate the initial snapshot. The snapshot agent prepares the snapshot that contains the schema and data, needed to initialize the subscriber(s):
In some cases it takes a while, but in the end, I found this “warning” on my screen:
When you dig into this by opening the agents tab in the replication monitor, you see the actual error:
The complete error states:
Message: Query for data failed
Stack: at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.ThrowNativeBcpOutException(CConnection* pNativeConnectionWrapper)
at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.BcpOut(String strBcpObjectName, String strBcpObjectOwner, String strBaseBcpObjectName, Boolean fUnicodeConversion, String strDataFile, String strLoadOrderingHint, String strWhereClause, Boolean useTableLockHint, Int32 bcpFileFormatVersion)
at Microsoft.SqlServer.Replication.Snapshot.SqlServer.BcpOutThreadProvider.DoWork(WorkItem workItem)
at Microsoft.SqlServer.Replication.AgentCore.BaseAgentThread.AgentThreadProcWrapper() (Source: MSSQLServer, Error number: 0)
Get help: http://help/0
Message: Conversion failed when converting date and/or time from character string.
Stack: (Source: MSSQLServer, Error number: 241)
Get help: http://help/241
First I thought that the DATE column being part of the primary key was the problem. But then again, another table in a different database had the combination of a DATE and a UNIQUEIDENTIFIER as a primary key. So that couldn’t be it.
During my investigation to this issue, I found the following:
– There was enough space on the disk for the snapshot
– Other database with DATE in primary key (PK) worked (even though combination was DATE + UNIQUEIDENTIFIER, instead of DATE + VARCHAR)
– Adding “-UseInprocLoader” to Snapshot Agent job doesn’t work
So after digging a bit deeper, and trying to reduce the issue to a small-scale problem, I found:
– Reproduced in an empty (new) database, with just the failing table as single object in the database –> STILL FAILS
– Changed object in test DB to use newly created IDENTITY(1,1) as PK, and added Unique Constraint to old PK columns –> WORKS
– Make DATE column part of different PK (together with IDENTITY), and no Unique Constraint –> WORKS
– Changing the PK to just the VARCHAR column –> WORKS
– Changing the PK to just the DATE column –> WORKS
– Setting database on local instance (SQL 2016 Dev) to comp. 110, just like on test –> WORKS
After a lot of different variables in the test-setup, I found out that it’s probably an old bug that wasn’t properly patched when upgrading the SQL Server engine to a newer version. Let me elaborate on that:
– The bug is reproducible on the test server, which is an upgraded engine from SQL 2012 or 2014 to SQL 2016 RTM
– The bug is reproducible on the production server, which is an upgraded engine from SQL 2014 to SQL 2016 RTM
– The bug is not reproducible on a clean install of SQL 2014
– The bug is not reproducible on a clean install of SQL 2016 RTM
– The bug is not reproducible on a clean install of SQL vNext CTP
Finding a work-around
Because I couldn’t find a work-around for this issue, I requested the help from Justing Langford (Blog | @JustinLangford) from Coeo. He pointed me to an article that describes a few possible work-arounds.
For me the row filter did the trick:
As described in the article mentioned above, adding the row filter disables the BCP partioning for this article (table), and the snapshot agent completed without any problem:
Although this bug should’ve been fixed ages ago, it looks like it (re)appeared again after an engine upgrade. I’m not sure how this happened, but all I know is that it took me about 3 days to find, reproduce, reduce and work around the problem.
So hopefully this article will save you that time!