Cleanup sysobjects after disabling replication

In my previous blog post I explained how we are going to leverage the power of transactional replication, to move your on-premise databases to Azure SQL databases (PaaS). This works like a charm, but you do end up with a bunch of system-generated objects in your database. So how do you get rid of these?

 
What “mess” is left behind after disabling replication
After you’ve completed your migration to Azure SQL DB, and cut the strings on your transactional replication, you end up with a bunch of system-generated objects. These are used by SQL Server to “replay” the changes on the subscriber (inserts, updates and deletes) that were made on the publisher. This is just an example of one of our smaller databases:

 
As you can see, there are 3 stored procedures per table (ins, upd, del), and some other objects used by replication.

Maybe it’s my IT-OCD talking, but it feels nice to clean this up, isn’t it?!

 
Cleanup
Fortunately the SQL Server team gave us a way to easily clean up these objects: sp_subscription_cleanup (that, even though MS Docs says it won’t, WILL work on Azure SQL DB).

This stored procedure accepts the following input:

@publisher: The name of the SQL Server instance of the publisher
@publisher_db: The name of the publishing database
@publication: The name of the publication (this was configured in the setup/configuration of replication)

By running this stored procedure on your subscriber, it cleans up all remaining transactional replication components. But where can you find the input for this procedure?

First I was looking at MSreplication_objects, but this doesn’t work on Azure SQL db (both MS DOCS says so, and I’ve tested it). So my guess was that all the other systemobject mentioned in related articles wouldn’t work as well.

So what do you do when you can’t find the answer yourself after extensive Googling/Bing-ing? Yes, you call in reinforcements (in this case William Durkin (Website | @sql_williamd), who knows a lot about replication):

 
But as you can tell by the reply on William’s response, I quickly found the answer by just trying something out: MSreplication_subscriptions. You can query this table on your subscriber to retrieve the publisher- and publication-properties. Again: even though MS DOCS says it won’t work, it does work on Azure SQL db.

So thanks to some “Rubber Ducking” I found the solution:

 
Thanks again for your mental support William!

So by using the output of MSreplication_subscriptions on your subscriber, you can quickly find the information needed to run sp_subscription_cleanup.

 
Automating the script with C#
Running the script on 1 or 2 databases can be done manually. But because I migrated an entire data platform with replication, we’re talking about A LOT of databases. So instead of doing this all manually (you need to connect to every single database to check this), I decided to create a small tool to help me with this.

In all honesty, my C#-skills aren’t that great, but my response to that is: “If it looks stupid but works it ain’t stupid”.

The tool connects to the instance you provided, returns a list of databasenames, and uses this list to connect to every database individually to check for replication components:

 
The tool can be used to connect to any Azure SQL db instance, as long as you have 1 login/user that can connect to all database on that server. If your server supports that (it depends on the configuration you use of course), you’ll see a list that provides you with the following information:

– Databasename
– Publisher
– Publisher_db
– Publication
– Distribution_agent
– CleanupScript

The last column is result of the following statement:

SELECT
	DB_NAME() AS Databasename,
	Publisher,
	Publisher_db,
	Publication,
	Distribution_agent,
	'EXEC sp_subscription_cleanup @publisher = ''' + Publisher + ''', @publisher_db = ''' + Publisher_db + ''', @publication = ''' + Publication + ''';' AS CleanupScript
FROM MSreplication_subscriptions

 
You can also use this script without downloading the tool.

But be careful, these could still be in use, so check that before you start cleaning up!

You can download the tool here (be aware, this is a .exe file!), or download the Source code here, and contact me if you would like me to share it via GitHub or something like that.

Advertisements

Enriching your dataset – The SSIS way…

In my last post I talked about the technical options you have to enrich your dataset with spatial data. After writing that post, I decided to try and build the option like mentioned by Donabel Santos (Blog | @sqlbelle). Once I got that working, I tried to extend it a little, so it would fetch and store the data that I wanted. And like I mentioned to Koen Verbeeck (Blog | @Ko_Ver), it’s also easy to build for someone with rusty SSIS-skills (like me).

Creating the resources
In order to start with the SSIS solution, you’ll first need to create a source and a destination table for your data. I’ve created a GIS_Test and a GIS_Results table, with a few samples for you. You can download the file here.

Creating a new SSIS project
Once you’ve created the database resources, start Visual Studio, and Create a new SSIS project. You can find this under “Business Intelligence” -> “Integration Services”, like shown below:

Once you’ve created your project, add a new “OLE DB Connection” to your SSIS Package, by right-clicking in the “Connection Managers” at the bottom of your screen:

In the window that shows up, choose a connection you used earlier, of click “New…” to make a new one. Fill in your SQL Server information, and click “OK”.

Adding components
In order to actually do something with data, you need to add a “Data Flow” into your SSIS package. You can do this by manually dragging one into your “Control Flow”, or you can just the “Data Flow” tab, and click the message “No Data Flow tasks have been added to this package. Click here to add a new Data Flow task.”, and SSIS will add one for you:

Now drag in a “Source Assistant” into your “Data Flow”. If the screen below shows, click the OLE DB Connection you made earlier, and click “OK”:

Now double-click the “OLE DB Source”, and select the source table you’ve created earlier:

Drag in a “Script Component” and click “Transformation” on the next screen:

Now connect both components, by clicking on the “OLE DB Source” component, and drag the green arrow on the “Script Component”:

Double-click on the “Script Component” and choose “Input Columns”. Now select all columns (except the ID column), and set the “Usage Type” to “ReadWrite”, and click “OK”:

Drag in a “Destination Assistent”, and choose the OLE DB Connection you’ve created earlier. Now connect the “Script Component” to the “Destination Assistent” by dragging the green arrow like before.

Open the “OLE DB Destination” and choose the GIS_Results table as destination:

Now click on “Mapping” to auto-map your columns. Check them if the auto-mapping actually worked!

Start the coding!
With a completed package, now let’s start coding! Double-click the “Script Component” and click “Edit Script…”. A new Visual Studio window will open, where you can edit the script that will fetch your data from the Google Geocoding API.

First of all, add a reference to the System.Web namespace:

Add a new class named “Geocoding.cs”, and overwrite the content with the code from this Geocoding.cs

Go to the main.cs file, and copy the method called “CreateURL” from this stripped version of main.cs.

Add the code from the downloaded “Input0_ProcessInputRow” to your own “Input0_ProcessInputRow” method.

Go back to your original Visual Studio with the SSIS package, and press “OK” on the open “Script Component” window. This will close the Visual Studio with the actual script, and stores it in your SSIS package.

Can I finally run it now?!
Yes! Yes, you can! And if you run the SSIS package, you’ll see that it saves the records into the GIS_Results table. It adds the complete URL that was used to fetch data from the Geocoding API, so you can debug it in case of errors. Also, I’ve added a Geography object, so it’s easy for you to visualize the data retrieved from the API. And the last column I’ve added is the status code returned by the API, so it’s easy to spot weird results.

A word of thanks…
And as a final remark, I’d like to thank Donabel Santos for letting me use her post as a guideline, and her code as a basis for my version.

Downloads
I’ve also made the project available for download. You can download the full solution here