Enriching your dataset – What are your technical options?
January 31, 2013 1 Comment
In my previous post I tried to explain how to enrich your dataset, so you can start using spatial data. You’ve seen that there are free options, or you can buy a bunch of files with the information you need.
This time I’ll try to explain the technical option you have. The options all contain the Google Geocoding API, because this is free, and it’s fairly accurate. But this will also work for the files you buy from (for example) GfK GeoMarketing, which I did business with once.
Write your own application
One of the easiest an quickest options for me was to write an application to fetch my data from the Google API. I’m not a die-hard programmer, but I know my way around C# as far as I need to. So with a little help from my colleagues, I managed to write a Windows Forms application that calls the Geocoding API. The results of the API call are either just shown on screen (in a GridView), or exported to Excel (using LinqToExcel and ExcelExporter).
Another option I intend to build in, is the export from and to a SQL Server database. But because this is a project I work on in my own time, this could take a couple of weeks.
One of the other options I’ve found, is a CLR that calls the API. The CLR you write, is basically a .NET application that you load into SQL Server. It’s more complicated than that, but to keep this story moving on, I’ll leave it at that.
One of the many downsides of using a CLR, is your local DBA. Every DBA I’ve encountered in my life told me: “Don’t use a CLR, EVER!!! Because that’s the biggest security hole you can open up on a SQL Server”. To be completely honest, I’m not that familiar with CLR’s, but I guess it isn’t as easy as that. There might be some pros and cons regarding that…
In my quest to find all possible options, I found this great article by Donabel Santos (Blog | @sqlbelle). In her article she describes how you can create an SSIS package that fetches a Lat/Long for every record in the dataset you use as input.
Talking about this with Koen Verbeeck (Blog | @Ko_Ver) on Twitter made me realize that for normal businesses, this might be the best option. In every company I know, there’s always a server running SQL Server Integration Services that you can use to execute such a package.
So, what’s your choice?
Looking at the options above, I’m guessing the option you choose depends on the company you work for. I guess that smaller companies would choose the SSIS package over building a tool. But if your company has a development department with a bunch of software developers, writing your own tool might be a better option. In that case writing your own tool gives you more flexibility, because not only your DBA can debug an error in an SSIS package, but every developer can debug the tool they’ve written.
If you’ve got some .NET/C# knowledge, and you have the time to dive into CLR’s, that might be your best option. CLR’s are loaded into SQL Server, and are (as far as I can tell) blazing fast. So if you’re looking for performance, and don’t want external tools or packages to fetch your data, go for it!