SSIS: Zipping files with WinRAR

One of the strengths of SSIS (SQL Server Integration Services), is that’s it’s really powerful with the components available. But it’s easy to use the “Execute Process Task” to execute external tools. One of these tools can be WinRAR.

 
Zip file task
By default, SSIS doesn’t have a zip file task. There are some 3rd party components, like the one Joost van Rossum (Blog | @ssisjoost) built. You can find his custom task here. But what if you don’t to use 3rd party components? You can create your own zip task, by using an “Execute Process Task”.

 
Variables and Parameters
Because I want to keep this as easy as possible, and I might want to reuse this later in another package, I started by creating 2 parameters and 1 local variable:

 
Variable
Function:
To generating a unique zipfile name per day, I decided to add the date to the zipfile name

Name:
ZipName

Value:
“ZIPPED_” +
(DT_WSTR, 50) YEAR((DT_DBTIMESTAMP) @[System::StartTime]) + “_” +
RIGHT(“00″ + (DT_WSTR, 50) MONTH((DT_DBTIMESTAMP) @[System::StartTime]), 2) + “_” +
RIGHT(“00″ + (DT_WSTR, 50) DAY((DT_DBTIMESTAMP) @[System::StartTime]), 2) +
“.zip”

 
This expression will evaluate to:

 
Parameter 1
Function:
Point the “Execute Process Task” to the folder of the files that need to be zipped

Name:
FilePath

Value:
C:\Temp

 
Parameter 2
Function:
Tell WinRAR which files to zip in which directory

Name:
FilesToZip

Value:
C:\\Temp\\*.txt

 
When you added these Variable and Parameters, you should have this Variable:

 
And this Parameters:

 
Once that is done, you can add an “Execute Process Task” to the Control Flow. When you open the component you just added, go to the executable textbox, and point it to the WinRAR executable on your disk”:

 
Now you need to add the arguments that tell WinRAR what files to zip, where to zip them, and which files need to be zipped. In the “Execute Process Task”, click on Expressions, and on the dotted button:

 
In the next window, select “Arguments” under Property, and press the dotted button again:

 
Now paste the string below in this window:

 
“a -df ” + @[$Package::FilePath] + @[User::ZipName] + ” ” + @[$Package::FilesToZip]

 
If you press the evaluate button, you’ll see this result:

 

 
The command line switches we use are:

a: Archive files (or x for Extract files)
-df: Delete files after archiving

 
So once you know how it’s done, this is pretty easy, and does what it has to: zip your files without any issues. One thing to remember, is that SSIS can’t throw the exceptions WinRAR throws. So I tested this before, and didn’t notice the directory didn’t exist. My SSIS package kept on failing, and I didn’t know why. But after trying the expression we put into the expression textbox of the “Execute Process Task” in a command prompt, I did see the actual error.

So even though it’s easy to use, it’s not always easy to debug when it fails. But as long as you can execute it via command line yourself, you’re just a few steps away from seeing the actual error.

 
If you want to read more about SSIS, don’t forget to check out these blog posts:

- Julie Koesmarno: On sabbatical
Mickey Stuewe: SQL Server data transferred to a sqlite database using ssis
Chris Yates: ISPACs – Got To Love Em

Moving files with SSIS

In the last few weeks, I’ve seen the ugly side of being a DBA: Server maintenance. Most of the time this is only needed when things go wrong, checking the backups, etc. But last week I had something that makes every DBA on earth shiver: diskspace shortage.

One of the ETL processes that runs on our server, downloads a bunch of XML files from an external source, and stores those on local disk. After that, the XML files are imported into a database, so our business analysts can use the content for their analysis. After the import, the XML files are zipped, and stored in a folder called “Backup”. But because the files are never deleted after the imported (they might want to audit the data and files), the disk they were stored on was running low on diskspace.

 
Thinking about the options
As a DBA, it’s your job make your own life as easy as possible. So moving the files by hand is okay if you only need to do it once. But I’d rather solve this once and for all, because maybe next time the disk will run out of space in the middle of the night, instead of during the day like this time.

So to prevent this issue in the future, I’ve created an SSIS solution to run directly after the import. This SSIS package moves the files from local disk to SAN storage. This way, we’re getting rid of the local files, but the files are still available when needed.

 
Creating your test environment
The first thing I did was create a copy of the directory structure that exists on the server, on my local machine. I’ve just copied the structure, and put a single .txt file in every directory. You can download the “test environment” setup here. Unzip the contents of the folder to “C:\Temp”, so you can use the SSIS package like described below, without making changes to it.

 
Creating the package
I started by creating a new package, and added a “Foreach Loop Container” for every folder that contains XML- and text-files. In every “Foreach Loop Container” I’ve placed a “File System Task”. After that, I’ve created the variables needed for this package:

Some of the variables are created just for the scope of the corresponding “Foreach Loop Container”. So if you want to see all variables in one overview, click the “Show All Variables” button that I’ve marked in the image above.

 
Configuring the components
Now you can start configuring the components. The “Foreach Loop Container” needs to be configured like this:

And:

And the “File System Task” needs to be configured like this:

You can also download the SSIS package as shown above, directly here (save the XML as dtsx file).

 
Test run
Now you’re ready for a test run! If you didn’t download the directories and files (the “test environment”) above, make sure you create all the destination directories. If you don’t, the package will fail. I didn’t build it into this version, but you might want to add a feature like that to the package.

Enriching your dataset – The SSIS way…

In my last post I talked about the technical options you have to enrich your dataset with spatial data. After writing that post, I decided to try and build the option like mentioned by Donabel Santos (Blog | @sqlbelle). Once I got that working, I tried to extend it a little, so it would fetch and store the data that I wanted. And like I mentioned to Koen Verbeeck (Blog | @Ko_Ver), it’s also easy to build for someone with rusty SSIS-skills (like me).

Creating the resources
In order to start with the SSIS solution, you’ll first need to create a source and a destination table for your data. I’ve created a GIS_Test and a GIS_Results table, with a few samples for you. You can download the file here.

Creating a new SSIS project
Once you’ve created the database resources, start Visual Studio, and Create a new SSIS project. You can find this under “Business Intelligence” -> “Integration Services”, like shown below:

Once you’ve created your project, add a new “OLE DB Connection” to your SSIS Package, by right-clicking in the “Connection Managers” at the bottom of your screen:

In the window that shows up, choose a connection you used earlier, of click “New…” to make a new one. Fill in your SQL Server information, and click “OK”.

Adding components
In order to actually do something with data, you need to add a “Data Flow” into your SSIS package. You can do this by manually dragging one into your “Control Flow”, or you can just the “Data Flow” tab, and click the message “No Data Flow tasks have been added to this package. Click here to add a new Data Flow task.”, and SSIS will add one for you:

Now drag in a “Source Assistant” into your “Data Flow”. If the screen below shows, click the OLE DB Connection you made earlier, and click “OK”:

Now double-click the “OLE DB Source”, and select the source table you’ve created earlier:

Drag in a “Script Component” and click “Transformation” on the next screen:

Now connect both components, by clicking on the “OLE DB Source” component, and drag the green arrow on the “Script Component”:

Double-click on the “Script Component” and choose “Input Columns”. Now select all columns (except the ID column), and set the “Usage Type” to “ReadWrite”, and click “OK”:

Drag in a “Destination Assistent”, and choose the OLE DB Connection you’ve created earlier. Now connect the “Script Component” to the “Destination Assistent” by dragging the green arrow like before.

Open the “OLE DB Destination” and choose the GIS_Results table as destination:

Now click on “Mapping” to auto-map your columns. Check them if the auto-mapping actually worked!

Start the coding!
With a completed package, now let’s start coding! Double-click the “Script Component” and click “Edit Script…”. A new Visual Studio window will open, where you can edit the script that will fetch your data from the Google Geocoding API.

First of all, add a reference to the System.Web namespace:

Add a new class named “Geocoding.cs”, and overwrite the content with the code from this Geocoding.cs

Go to the main.cs file, and copy the method called “CreateURL” from this stripped version of main.cs.

Add the code from the downloaded “Input0_ProcessInputRow” to your own “Input0_ProcessInputRow” method.

Go back to your original Visual Studio with the SSIS package, and press “OK” on the open “Script Component” window. This will close the Visual Studio with the actual script, and stores it in your SSIS package.

Can I finally run it now?!
Yes! Yes, you can! And if you run the SSIS package, you’ll see that it saves the records into the GIS_Results table. It adds the complete URL that was used to fetch data from the Geocoding API, so you can debug it in case of errors. Also, I’ve added a Geography object, so it’s easy for you to visualize the data retrieved from the API. And the last column I’ve added is the status code returned by the API, so it’s easy to spot weird results.

A word of thanks…
And as a final remark, I’d like to thank Donabel Santos for letting me use her post as a guideline, and her code as a basis for my version.

Downloads
I’ve also made the project available for download. You can download the full solution here

Enriching your dataset – What are your technical options?

In my previous post I tried to explain how to enrich your dataset, so you can start using spatial data. You’ve seen that there are free options, or you can buy a bunch of files with the information you need.

This time I’ll try to explain the technical option you have. The options all contain the Google Geocoding API, because this is free, and it’s fairly accurate. But this will also work for the files you buy from (for example) GfK GeoMarketing, which I did business with once.

Write your own application
One of the easiest an quickest options for me was to write an application to fetch my data from the Google API. I’m not a die-hard programmer, but I know my way around C# as far as I need to. So with a little help from my colleagues, I managed to write a Windows Forms application that calls the Geocoding API. The results of the API call are either just shown on screen (in a GridView), or exported to Excel (using LinqToExcel and ExcelExporter).

Another option I intend to build in, is the export from and to a SQL Server database. But because this is a project I work on in my own time, this could take a couple of weeks.

CLR
One of the other options I’ve found, is a CLR that calls the API. The CLR you write, is basically a .NET application that you load into SQL Server. It’s more complicated than that, but to keep this story moving on, I’ll leave it at that.

One of the many downsides of using a CLR, is your local DBA. Every DBA I’ve encountered in my life told me: “Don’t use a CLR, EVER!!! Because that’s the biggest security hole you can open up on a SQL Server”. To be completely honest, I’m not that familiar with CLR’s, but I guess it isn’t as easy as that. There might be some pros and cons regarding that…

SSIS Package
In my quest to find all possible options, I found this great article by Donabel Santos (Blog | @sqlbelle). In her article she describes how you can create an SSIS package that fetches a Lat/Long for every record in the dataset you use as input.

Talking about this with Koen Verbeeck (Blog | @Ko_Ver) on Twitter made me realize that for normal businesses, this might be the best option. In every company I know, there’s always a server running SQL Server Integration Services that you can use to execute such a package.

So, what’s your choice?
Looking at the options above, I’m guessing the option you choose depends on the company you work for. I guess that smaller companies would choose the SSIS package over building a tool. But if your company has a development department with a bunch of software developers, writing your own tool might be a better option. In that case writing your own tool gives you more flexibility, because not only your DBA can debug an error in an SSIS package, but every developer can debug the tool they’ve written.

If you’ve got some .NET/C# knowledge, and you have the time to dive into CLR’s, that might be your best option. CLR’s are loaded into SQL Server, and are (as far as I can tell) blazing fast. So if you’re looking for performance, and don’t want external tools or packages to fetch your data, go for it!

Follow

Get every new post delivered to your Inbox.

Join 50 other followers