Enriching your dataset – What are your technical options?

In my previous post I tried to explain how to enrich your dataset, so you can start using spatial data. You’ve seen that there are free options, or you can buy a bunch of files with the information you need.

This time I’ll try to explain the technical option you have. The options all contain the Google Geocoding API, because this is free, and it’s fairly accurate. But this will also work for the files you buy from (for example) GfK GeoMarketing, which I did business with once.

Write your own application
One of the easiest an quickest options for me was to write an application to fetch my data from the Google API. I’m not a die-hard programmer, but I know my way around C# as far as I need to. So with a little help from my colleagues, I managed to write a Windows Forms application that calls the Geocoding API. The results of the API call are either just shown on screen (in a GridView), or exported to Excel (using LinqToExcel and ExcelExporter).

Another option I intend to build in, is the export from and to a SQL Server database. But because this is a project I work on in my own time, this could take a couple of weeks.

CLR
One of the other options I’ve found, is a CLR that calls the API. The CLR you write, is basically a .NET application that you load into SQL Server. It’s more complicated than that, but to keep this story moving on, I’ll leave it at that.

One of the many downsides of using a CLR, is your local DBA. Every DBA I’ve encountered in my life told me: “Don’t use a CLR, EVER!!! Because that’s the biggest security hole you can open up on a SQL Server”. To be completely honest, I’m not that familiar with CLR’s, but I guess it isn’t as easy as that. There might be some pros and cons regarding that…

SSIS Package
In my quest to find all possible options, I found this great article by Donabel Santos (Blog | @sqlbelle). In her article she describes how you can create an SSIS package that fetches a Lat/Long for every record in the dataset you use as input.

Talking about this with Koen Verbeeck (Blog | @Ko_Ver) on Twitter made me realize that for normal businesses, this might be the best option. In every company I know, there’s always a server running SQL Server Integration Services that you can use to execute such a package.

So, what’s your choice?
Looking at the options above, I’m guessing the option you choose depends on the company you work for. I guess that smaller companies would choose the SSIS package over building a tool. But if your company has a development department with a bunch of software developers, writing your own tool might be a better option. In that case writing your own tool gives you more flexibility, because not only your DBA can debug an error in an SSIS package, but every developer can debug the tool they’ve written.

If you’ve got some .NET/C# knowledge, and you have the time to dive into CLR’s, that might be your best option. CLR’s are loaded into SQL Server, and are (as far as I can tell) blazing fast. So if you’re looking for performance, and don’t want external tools or packages to fetch your data, go for it!

#SQLHelp – SQL 2012 Management Studio Freezes

As I told you in a few of my previous blogposts, I try to follow the #SQLHelp hashtag / topic. And two weeks ago, I could help another colleague via this communication channel.

When SQL Server 2012 RTM came out, I installed it as quick as possible. Just to try it out, and to see what the differences were compared to the other version I installed on my machine: SQL Server 2008. When using SQL Server Management Studio 2012, I encountered random freezes of SSMS. The freezes didn’t occur every time I opened a menu, or started a wizard or something. So it was a problem with my installation.

After a while, I remembered that the installations of SQL Server 2005 and 2008 had the same issue. These SSMS installations also froze, because they shared some dll’s with Visual Studio. So the issues I had now, might have the same cause. And eventually I re-applied Visual Studio SP1, and this solved the issue for me.

And after a few weeks, I saw a similar question from Samson J. Loo (Blog | @ayesamson) coming by, using the #SQLHelp hashtag:

@ayesamson, 2012-05-23

has anyone experienced random unresponsiveness with SSMS 2012 to a point where you have to kill the process? #sqlhelp #sql

So because I saw this issue before, I replied to his tweet:

@DevJef, 2012-05-23

@ayesamson: Yes. Are you running into this issue constantly, or just once? Problem might come from shared DLL’s with VS2010…

Apparently he was still having these issues:

@ayesamson, 2012-05-23

@DevJef its been happening more frequently now. I do have VS2010 installed as well. ‪#sqlhelp

So from my previous experience, I gave him the tip to re-apply Visual Studio 2010 SP1:

@DevJef, 2012-05-23

@ayesamson: I had the same issue. I actually fixed it by applying VS210 SP1 again. This might help you as well! ‪#SQLHelp

The next day, I got the confirmation that SP1 was re-applied:

@ayesamson, 2012-05-24

@DevJef I re-applied VS2010 SP1 this morning, rebooted and haven’t had an issue. If I don’t have an issue come Mon. then we’re golden!

And a week later, I got the great news it helped him get rid of the freezes:

@ayesamson, 2012-05-31

@DevJef well I haven’t experienced any lockups with SSMS 2012 since reapplying VS2010 SP1. Thanks!! ‪#sqlhelp

So I was glad I could help him out, and happy he actually got back to me about resolving the issues. So thank you for that Samson! And for the rest of the community, I hope I helped you with writing this post!

Reporting Services – Query database

After reading a tweet by Ray Martinez, I decided to share the scripts for Reporting Services that I often use.

One of the scripts I used a lot, is a script that shows the Reports that were successfully executed in 2012. For these Reports you will see the last execution time, rows and bytes returned, and how long it took to gather and show the results:

USE ReportServer

SELECT
	CL.Name						AS ReportName,
	CL.Description				AS ReportDescription,
	CL.Path						AS ReportPath,
	CL.CreationDate				AS ReportCreationDate,
	SUM(1)						AS TotalNumberOfTimesExecuted,
	MAX(EL.TimeStart)			AS LastTimeExecuted,
	AVG(EL.[RowCount])			AS AVG_NumberOfRows,
	AVG(EL.TimeDataRetrieval)	AS AVG_DataRetrievalTime,
	AVG(EL.TimeProcessing)		AS AVG_TimeProcessing,
	AVG(EL.TimeRendering)		AS AVG_TimeRendering
FROM ExecutionLog EL
JOIN Catalog CL
	ON CL.ItemID = EL.ReportID
WHERE 1 = 1
AND CL.Name IS NOT NULL
AND EL.Status ='rsSuccess'
GROUP BY
	CL.Name,
	CL.Path,
	CL.CreationDate,
	CL.Description
HAVING YEAR(MAX(EL.TimeStart)) = 2012
ORDER BY COUNT(EL.ReportID) DESC

The second script I want to share with you, shows the first 1000 successfully executed Reports. The data that is returned includes Report format, parameters used to query the data, information about the returned resultset and time needed to return and render the data and Report, etc. I uses a top 1000 because our Report server returned a lot of rows, where I only needed a few for my analysis.

USE ReportServer

SELECT TOP 1000
	EL.InstanceName				AS SQLInstanceName,
	EL.UserName					AS ExecuterUserName,
	EL.Format					AS ReportFormat,
	EL.Parameters				AS ReportParameters,
	EL.TimeStart				AS TimeStarted,
	EL.TimeEnd					AS TimeEnded,
	EL.TimeDataRetrieval		AS TimeDataRetrieval,
	EL.TimeProcessing			AS TimeProcessing,
	EL.TimeRendering			AS TimeRendering,
	EL2.Source					AS Source,
	EL.ByteCount				AS ReportInBytes,
	EL.[RowCount]				AS ReportRows,
	CL.Name						AS ReportName,
	CL.Path						AS ReportPath,
	CL.Hidden					AS ReportHidden,
	CL.CreationDate				AS CreationDate,
	CL.ModifiedDate				AS ModifiedDate,
	EL2.Format					AS RenderingFormat,
	EL2.ReportAction			AS ReportAction,
	EL2.Status					AS ExectionResult,
	DS.Name						AS DataSourceName,
	DS.Extension				AS DataSourceExtension
FROM ExecutionLog EL
JOIN Catalog CL
	ON CL.ItemID = EL.ReportID
LEFT JOIN ExecutionLog2 EL2
	ON EL2.ReportPath = CL.Path
JOIN DataSource DS
	ON DS.ItemID = CL.ItemID
WHERE 1 = 1
AND EL.Status = 'rsSuccess'
ORDER BY EL.TimeStart DESC

The next script returns an overview of the folder and Reports on your Report server. For every object on your server, you can see the creation- and modify date. The joined Executionlog table is used to get the number of times the Report was executed, by which user, and how much data was returned in which time:

USE ReportServer

SELECT
	CASE CL.Type
		WHEN 1 THEN 'Folder'
		WHEN 2 THEN 'Report'
		WHEN 3 THEN 'Resource'
		WHEN 4 THEN 'Linked Report'
		WHEN 5 THEN 'Data Source'
	END									AS ObjectType,
	CP.Name								AS ParentName,
	CL.Name								AS Name,
	CL.Path								AS Path,
	CU.UserName							AS CreatedBy,
	CL.CreationDate						AS CreationDate,
	UM.UserName							AS ModifiedBy,
	CL.ModifiedDate						AS ModifiedDate,
	CE.CountStart						AS TotalExecutions,
	EL.InstanceName						AS LastExecutedInstanceName,
	EL.UserName							AS LastExecuter,
	EL.Format							AS LastFormat,
	EL.TimeStart						AS LastTimeStarted,
	EL.TimeEnd							AS LastTimeEnded,
	EL.TimeDataRetrieval				AS LastTimeDataRetrieval,
	EL.TimeProcessing					AS LastTimeProcessing,
	EL.TimeRendering					AS LastTimeRendering,
	EL.Status							AS LastResult,
	EL.ByteCount						AS LastByteCount,
	EL.[RowCount]						AS LastRowCount,
	SO.UserName							AS SubscriptionOwner,
	SU.UserName							AS SubscriptionModifiedBy,
	SS.ModifiedDate						AS SubscriptionModifiedDate,
	SS.Description						AS SubscriptionDescription,
	SS.LastStatus						AS SubscriptionLastResult,
	SS.LastRunTime						AS SubscriptionLastRunTime
FROM Catalog CL
JOIN Catalog CP
	ON CP.ItemID = CL.ParentID
JOIN Users CU
	ON CU.UserID = CL.CreatedByID
JOIN Users UM
	ON UM.UserID = CL.ModifiedByID
LEFT JOIN ( SELECT
				ReportID,
				MAX(TimeStart) LastTimeStart
			FROM ExecutionLog
			GROUP BY ReportID) LE
	ON LE.ReportID = CL.ItemID
LEFT JOIN ( SELECT
				ReportID,
				COUNT(TimeStart) CountStart
			FROM ExecutionLog
			GROUP BY ReportID) CE
	ON CE.ReportID = CL.ItemID
LEFT JOIN ExecutionLog EL
	ON EL.ReportID = LE.ReportID
	AND EL.TimeStart = LE.LastTimeStart
LEFT JOIN Subscriptions SS
	ON SS.Report_OID = CL.ItemID
LEFT JOIN Users SO
	ON SO.UserID = SS.OwnerID
LEFT JOIN Users SU
	ON SU.UserID = SS.ModifiedByID
WHERE 1 = 1
ORDER BY CP.Name, CL.Name ASC

The last query is a short one. This returns the Reports that don’t inherit permissions, that are set in the site settings menu.

USE ReportServer


SELECT
	Path,
	Name
FROM Catalog
WHERE PolicyRoot = 1 

Hopefully there will be more where this came from. So follow my blog, or contact me by email or Twitter!

Follow

Get every new post delivered to your Inbox.

Join 34 other followers