Starting with Spatial Data – Enriching your dataset
January 22, 2013 4 Comments
The biggest issue when you start working with spatial data, is asking yourself the question: “What do I want to visualize?”. Do you want to visualize the areas your clients come from? Or do you want to calculate in which areas your new potential clients live? The next hurdle to take, is thinking about how to enrich the data you have now.
The easy way out
Almost every company has a accessible database with client information. The easy way out is to download a free spatial data set (like described in an earlier post), and visualize your data by joining your data set on the downloaded data set, based on country-, province-, or city name. This way, you don’t alter your data, but you’re still able to visualize your data.
Enriching your data
The way to go (in my opinion), is to permanently enrich your data. In that case, you’re more flexible in visualizing your data. Another advantage is that you can always group your data in the same way as mentioned above. If you want to group it by country, province or city, you can still do it. But remember: you can always group your data, but mining your data is difficult without accurate coordinates. And there is a free option for enriching your data, if you don’t want to spend a lot of money and time on it.
Last year, I spend a lot of time working with spatial data. One of the biggest brainteasers was how to enrich our data set, without actually disrupting the normal running processes. You don’t want to change your underlying data set, and then find out you need to rewrite all your software.
No seriously, it’s free…
One of the options I thought of, was converting a postal code, or zip code for my American friends (Seriously USA?! Do you have a different word for EVERYTHING?! :)). The conversion can’t be done by number crunching, but needs to be done by looking it up. One of the options is to look up every postal code by hand. But if you have a few million addresses in your database, that’s not really an option.
One of the options I came across was the free Google Geocoding API. This allows you to translate a postal code into a Latitude and Longitude. And these coordinates can be plotted on a map.
And if you have a Latitude and Longitude, you can use Reverse Geocoding to translate your coordinates into a postal code.
A quick example of this, is a tool I’ve written. It translates a postal code into Lat/Long:
Or translates a Lat/Long into postal code, using the Reverse GeoCoding:
I know this isn’t a pretty tool yet, but I’m still working on it. Until now, it was only a trial to see if the Google API returns the data I need. This tool is also still based on the 2.0 API, which is deprecated since March 8, 2010. It will still work until March 8, 2013, but I need to rebuild it soon.
The next step of course, is to expand the tool to allow the user to translate a whole list of postal codes, all at once. You don’t want to copy-paste the millions of postal codes by hand into the tool, and copy-paste the result back into an Excel workbook or SQL Server table.
But since I’m not a .NET developer, this could take a while. If you’re a developer, and interested in expanding and developing the postal code finder, contact me. Then we can make it a publicly available tool.
Looking at all of this, you’ll see that it’s possible to enrich your data for free, and without a lot of work. Okay, you still need to figure out if your postal code information is correct. Or let the API respond back with the message “G_GEO_UNKNOWN_ADDRESS”, which tells you that the address might be incorrect. Another limit is the number of API calls: 15.000 calls per day. But then again, it’s free, and you can’t have it all…