SSRS Reporting automation with .NET

This article was recently published on dev.getroadmap.com in 2 separate posts:
SSRS Reporting automation with .NET
Application Authentication via https using NTLM:

 

SQL Server Reporting Services (SSRS) is a great way to create an overview or analysis of your data, that you can share with other people as a report. But what if you have a report that you need to share with a large group of people, but they need it with 50 different parameters (like CustomerID for example), and they want to receive it in Excel or PDF? Are you manually going to execute the report with 50 different parameters, export them to the specific file format, and email those files? I don’t think so. Automating this process is easy if you write a small tool for this, and if you use the “Report Server Web Service URL”.

 
ReportServers vs Reports
Before we’re diving into the .NET code, first let’s see what the difference is between the URL’s “http:// [servername] :80/ReportServer” and “http:// [servername] :80/Reports”. If you navigate to your SSRS server, you’ll be redirected to “http:// [servername] :80/Reports”. This is the default webinterface that you use to open reports, manage subscriptions, etc:

 
If you go to “http:// [servername] :80/ReportServer”, you’ll end up in the webservice of SSRS. This allows you to open reports, and as a bonus: add parameters to your http request, so you can automatically execute reports from a URL. This is also called the “SSRS Virtual Directory”:

 
Building a URL
Now that we know that we need to use the webservice, we can start building our URL. First, let’s start with the base-URL. I’ve created a folder in SSRS called “Test”, and a report called “TestReport”. So the base-URL will be: “http:// [servername] :80/ReportServer/Pages/ReportViewer.aspx?%2fTest%2fTestReport”. And because my report has 2 date-parameters (From and To), I need to add these to the URL: “&From=2015-12-01&To=2015-12-08”.

This URL doesn’t run the report yet, until you add the command for that to the URL: “&rs:Command=Render”. So your complete URL will look like: “http:// [servername] :80/ReportServer/Pages/ReportViewer.aspx?%2fTest%2fTestReport&From=2015-12-01&To=2015-12-08&rs:Command=Render”

One thing to keep in mind is that you need to add the dates in the URL in the correct format (yyyy-MM-dd). If you don’t do that, SSRS will throw an exception.

 
Where to find these URL’s
If you log on to your SSRS server, you can start the “Reporting Services Configuration Manager”. This is the configuration tool for your SSRS instance.

In this tool you can configure both the webinterface URL:

 
And the virtual directory:

 
Text parameter in URL
But SSRS can also have text-fields as input for your report. These can also be added to the URL. Just like the parameters above, you just add the parameter name and value to the URL: “http:// [servername] :80/ReportServer/Pages/ReportViewer.aspx?%2fTest%2fTestReport&From=2015-12-01&To=2015-12-08&FreeText=This is a test…&rs:Command=Render”.

After some testing I’ve found out that you can use any character in the text parameter you want to, except for the &-sign. If you use that, SSRS will think it’s a parameter or command and won’t accept the URL. And there’s also the (browser) limitation of the URL length. Testing proves that the limit is 7926-7931 characters. If your URL is below 7926 characters, it works like a charm. If you go above that (between 7926 and 7931) the behavior of SSRS gets buggy, and above 7931 characters SSRS will throw an exception.

 
Export to file
Exporting your report to file can also be added to the URL. By adding “&rs:Format=EXCEL” to the end of the URL tells SSRS to export your report to Excel: “http:// [servername] :80/ReportServer/Pages/ReportViewer.aspx?%2fTest%2fTestReport&From=2015-12-01&To=2015-12-08&FreeText=This is a test…&rs:Command=Render&rs:Format=EXCEL”.

This output can be used to automatically store this file on disk or email it with a .NET application.

 
Export formats
There are several export formats in the webinterface of SSRS:

 
The available output formats depend on the version of SSRS you’re using. In SQL Server 2016 you have all the same export formats as you have in SQL Server 2014, but they added PowerPoint to that list.

 
Creating the application
To automatically download an exported report, I’ve created a “Windows Forms Application”. In this applications we need to do 3 things:

– Determine variable values
– Build a URL
– Download/Export the report

To determine the variable values, I added 2 “DateTimePickers”to the form, and a “TextBox” for the CustomerID. Other than that, there are 2 buttons: 1 to get the URL (might come in handy for testing), and 1 to export the report in the selected format. There’s also a “TextBox” so that you can configure the drop-folder for the files:

 
Build URL
In order to build the URL we need 5 pieces:

– The SSRS servername or URL
– The folder of the report (if it’s not in the root)
– The report name
– The parameters needed for executing the report
– The export format

In my case the folder (“Test”) and report name (“SSRSAutomationTestReport”) are known, so I hard-coded them:

string ReportServer = 
    "http://"
    + ReportServerURL
    + "/ReportServer/Pages/ReportViewer.aspx?%2fTest%2fSSRSAutomationTestReport"
    + "&From="
    + DT_From.Value.Date.ToString("yyyy-MM-dd")
    + "&To="
    + DT_To.Value.Date.ToString("yyyy-MM-dd")
    + "&CustomerID="
    + TB_CustomerID.Text
    + "&rs:Command=Render";

if (RB_Excel.Checked)
{
    ReportServer += "&rs:Format=EXCEL";
}

if (RB_PDF.Checked)
{
    ReportServer += "&rs:Format=PDF";
}

 
This results in the URL that you can use to export the report to a specific file format (in my case either Excel or PDF).

Download the file
To download the file we need to use the “CredentialCache”, because when you use the SSRS webservice to execute a report, an NTLM challenge takes place. The “CredentialCache” will solve the 2-step authentication for you. After that, you can use “WebClient” to download the file. This will look like this:

var url = new Uri(ReportURL);

string FileExtension = ".pdf";

if (RB_Excel.Checked)
    FileExtension = ".xls";

var location = TB_Dropfolder.Text + "SSRSAutomationTestReport - Customer " + TB_CustomerID.Text + FileExtension;

// When calling for the url a NTLM challenge takes place
// Once this challenge takes place the GetCredentials will automagically be called via de CredentialCache
// This will resolve the 2 step authentication
// Requirement: the uri for the cache must be the Scheme + Host of the domain
var cc = new CredentialCache();
cc.Add(new Uri(string.Format("{0}://{1}", url.Scheme, url.Host)), "NTLM", new NetworkCredential(Username, Password, Domain));

using (var client = new WebClient())
{
    client.Credentials = cc;
    client.DownloadFile(url, location);

    MessageBox.Show("Report is exported");
}

 
Download the resources
To show you how I solved this, I’ve made the resources available for download. You can download the SSRS report here, and the Windows Forms application here.

Please feel free to download them, try them out for yourself, and let me know what you think.

Comparing execution plans with SSMS

In SQL Server 2016 (now available as CTP 3.0), a new feature is shipped: Execution Plan Comparison Tool. This new (and very cool) feature allows you to compare 2 execution plans within SQL Server Management Studio (SSMS). And according to Amit Banerjee (Blog | @banerjeeamit), this feature is also available in the “SSMS – September 2015” release (more info here). Let’s take a look at this new feature.

Creating resources
To generate an execution plan that we can compare later on, let’s create a table with some data first:

USE Sandbox
GO


CREATE TABLE dbo.T1
	(ID INT IDENTITY(1,1),
	 VAL VARCHAR(10))
GO


INSERT INTO T1
	(VAL)
VALUES
	('X')
GO 10

 
To generate some execution plans, run the queries below with the “actual execution plan” on, and safe these plans to disk:

SELECT *
FROM T1 AS T1
INNER JOIN T1 AS T2 ON T1.ID = T2.ID

 
and:

SELECT *
FROM T1 AS T1
INNER JOIN T1 AS T2 ON T1.ID = T2.ID
INNER JOIN T1 AS T3 ON T1.ID = T3.ID

 
I’ve also made the 2 plans that were generated available for download here and here.

 
Comparing execution plans
To compare execution plans in, you need to open the first execution plan from SSMS. In the plan-window, right-click and click on “Compare Showplan”:

 
Now a pop-up window will open, that asks you what plan you want to use to compare it with. Now open “Test2.sqlplan”. The compare window opens, and you can compare plans:

 
As you can see, parts of the plan that are the same in both execution plans, are colored the same. These colors are randomly chosen, and can be different every time you compare 2 plans. This makes it easy to determine where both plans are equal or differ.

If you click on one of the highlighted parts in the execution plan, the other plan will center its view on that part of the plan (unfortunately it’s a bit hard to see that with these small execution plans). If you click on a highlighted part, you can see that there is a blue rectangle drawn around the object:

 
Another really cool thing is the properties windows. If you don’t have that open by default, right-click on the first object in the execution plan (the SELECT part), and click on “Properties”. This shows 2 property-windows, in which you can compare the memory grant for both plans for example:

 
Conclusion
At this moment, you need to save both execution plans to disk in order to compare them. If you try to compare an execution plan of a query you just ran without saving it, it throws an exception:

 
But in my opinion this is just a minor issue, and I think this will be fixed in one of the next releases.

Comparing execution plans is definitely something I’ve been missing for years. Especially when you have 2 really big plans, this can really help you speed up the analysis and debug process.

Even though I’m pretty used to using SQL Sentry Plan Explorer to open execution plans, that doesn’t give me the option to compare plans (yet). At least not in the free version of the tool, that I always recommend to colleagues and friends to use.

This is another one of the really cool features in the overhauled SSMS, and I think the SQL Server team is rocking this new release.

Tracking query progress with Live Query Statistics

How frustrating is it, to run a query on a database and it seems to be “stuck”. We’ve all seen that happen right? But how can you tell what the problem is, without letting the query complete (which could take a long time)? Microsoft (or actually the SQL Server team) gave us a new toy to play with that can help us in situations like this, and this tool is called “Live Query Statistics”.

 
Creating resources
To show you how the Live Query Statistics work, let’s create a sample table first, and insert 1.000 rows:

CREATE TABLE LiveQueryStats
	(ID INT IDENTITY(1,1),
	 VALUE VARCHAR(10))
GO

INSERT INTO LiveQueryStats
	(VALUE)
VALUES
	('X')
GO 1000

 
Because we want to actually see something happen, let’s multiply the number of rows coming from the single table by using the query below:

SELECT *
FROM LiveQueryStats T1
CROSS APPLY LiveQueryStats T2
CROSS APPLY LiveQueryStats T3

 
The query will return 1 billion rows (1.000.000.000), and will run for a while. This gives you the opportunity to look at the different features without rushing or losing your running query.

 
Live execution plan
Before running the query above, you need to enable the Live Query Statistics just like you would do to the normal execution plan. This is an extra button added in the SQL Server 2016 SSMS (and the downloadable version of course). When that’s enabled, you can execute the query, and SQL Server Management Studio (SSMS) will automatically switch to the live execution plan.

In this execution plan, you’ll see the data flow through the components, so you can actually see what SQL Server is doing at a specific moment:

 
Query completion percentage
Another great addition is the overall completed percentage, that you can find at the bottom of your SSMS. This shows you the percentage of completion, that can help you estimate the time till completion:

 
Live rowcount
The properties window normally shows you the in-depth information of your query after completion (number of threads, memory grant, etc). With the live execution plan enabled, it can provide you with real-time statistics as well:

 
Live execution plan from Activity Monitor
From the Activity Monitor you can also open the live execution plan of running queries on your instance. In the Activity Monitor, you have an additional tab in SQL Server 2016 called “Active Expensive Queries”. In this tab you can right-click on a running query, and click on “Show Live Execution Plan”. This opens a new tab in SSMS with the execution plan:

 
Unfortunately this only works for queries that have the live statistics enabled before execution. I don’t expect this to change in the final product, mainly because of the negative performance impact this feature can have on your queries and instance.

 
Drawbacks, Limitations and Bugs
Because this is only a CTP version of SQL Server 2016, we can expect some bugs and limitation, so I’ll be the last one to judge. And I don’t think these limitations are a big drawback on the feature.

One of these bugs is a crashing SSMS. If you open a live execution plan from the Activity Monitor, and close that tab, SSMS crashes every now and then.

 
Another weird thing is that exiting SSMS by clicking on the close button (top right) when the live execution plan is open, causes the list of recently used SQL Servers (the list used when connecting object explorer for example) to be cleared for some reason.

This behavior is reproducible, so I think this is a small bug in SSMS, or the fact that I upgraded this instance from the first CTP version till the current version. But I’m sure this will be fixed by the SQL Server Team in the next releases.

 
The biggest pitfall (in my opinion) of this feature is also mentioned in the documentation. This feature is “primarily intended for troubleshooting purposes” and “can slow the overall query performance”. And I know, it look SO COOL to have this on all queries you’re running, but please be careful with this. Don’t enable this on every running query, but only use this to debug issues!

Another limitation for this feature (at least at the moment I’m writing this), is the use in combination with columnstore indexes, memory optimized tables and natively compiled stored procedures. You can read more about this here.

And I shouldn’t even have to mention this, but remember: you can only use Live Execution Plans when you have SHOWPLAN permissions on the database (same permissions you need to view normal execution plans).

 
Alternatives
One of the questions I had when I read about this: are there any alternative for this? One of the only things that come to mind is the “Track My Query” tool, written by Matan Yungman (Blog | @MatanYungman). This tool allows you to monitor your query, and it will show you what part of your query is currently running. For more in-depth information, I recommend the SQLBits session where he explains the inner-workings of this tool. I’ve seen this session in person, and it was an interesting session.

 
Conclusion
With all these awesome new features that will be shipped in SQL Server 2016, I think there’s a whole new way of looking at SQL Server as a product. I’ve always liked working with SQL Server, but this version is taking it to the next level.

With all these new features, debugging issues is going to be a bit easier, developing new stuff is going to be faster, and the overall usability is going to skyrocket. I can’t wait to get my hands on the finished product!

Searching through the SQL Server error logs

SQL Server has a number of error logs, where both informational messages and errors are logged. You can compare it to the event viewer in Windows, but than only for SQL Server. This error log contains a lot of potentially useful information when you’re investigating an issue.

The physical location of the logfiles is “C:\Program Files\Microsoft SQL Server\MSSQL12.[InstanceName]\MSSQL\Log”. This can also be different if you changed the path in the SQL Server setup (for example, the path on my machine is: “C:\Program Files\Microsoft SQL Server 2014\MSSQL12.[InstanceName]\MSSQL\Log”). In that directory you’ll find a number of ERRORLOG.[Number] files. There is a file for every archive, which depends on your SQL Server configuration. You can open the files with notepad, or any other text-editor you like.

But you can also access these archives from SQL Server Management Studio (SSMS).

 
SQL Server Logs
If you connect the object explorer in your SSMS, you can navigate to Management -> SQL Server Logs. There you see the number of configured log files (default is 7 log files: 6 archives + current log file):

 
You can double-click a log file to open it. In the pop-up window you see the contents of the log, a number of checkboxes on the left to add more archives to the current view, and a button called “Filter…” that you can use to filter the current view:

 
But unfortunately the filter in the Log File Viewer isn’t always as easy to use. For example, you can’t (at least as far as I know) filter on 2 strings. So how are you going to look for any events that contains “Backup” or “Restore”? That’s not possible with this filter.

 
xp_readerrorlog
If you would rather use T-SQL to find things in the SQL Server Error Log, that’s also possible. There’s an extended procedure called xp_readerrorlog you can use for that, or you can use sp_readerrorlog (which is a stored procedure that used xp_readerrorlog).

If you use sp_readerrorlog or xp_readerrorlog, you can also apply filters. You can use a number of parameters to filter the output, but you can only do so on 1 singe log file:

EXEC xp_readerrorlog
	0,						--ArchiveID (First error log = 0)
	1,						--Log type (SQL Server = 1, SQL Agent = 2)
	N'Backup',				--Filter ProcessInfo
	N'Sandbox',				--Filter Text
	'20150826 00:00:00',	--DateFrom
	'20150828 10:32:00',	--DateTill
	'ASC'					--Sort order

 
By altering the filter-parameters (or leave them empty), you can search through the SQL Server error log or the SQL Server Agent error log, and you can do so on a number of columns and variables (like datetimes). But you still have the same issue you had with the log viewer in SSMS: you can can only search on 1 filter, and you can’t search through more than 1 archive at the same time. So how can we solve that?

 
Search trough multiple archives
Searching through multiple log files might come in handy if you’re not sure if the database was taken offline today or last week. Or if you’re not sure if the problem occurred before or after a log file cycle.

One way of doing this (which I use on a regular basis, and I blogged about earlier), is by inserting all the log information in a temporary table and search through that results:

DECLARE @SSEL TABLE
	(LogDate DATETIME,
	 ProcessInfo VARCHAR(100),
	 Text VARCHAR(MAX))

/* Insert current log */
INSERT INTO @SSEL
EXEC xp_readerrorlog 0

/* Insert previous log */
--INSERT INTO @XREL
--EXEC xp_readerrorlog 1

SELECT
	SSEL.LogDate,
	SSEL.ProcessInfo,
	SSEL.Text
FROM @SSEL AS SSEL
WHERE SSEL.Text LIKE '%Backup%'
ORDER BY SSEL.LogDate ASC

 
The only issue with this is that you need to find out how many log files you need to insert into the temporary table (if you want to search through all available logs). But you can also find the answer to that question with a query.

 
Number of configured logs
You can configure the amount of error logs from SSMS. If you right-click on the SQL Server Logs in the object explorer, you can click on Configure:

 
In the pop-up window, you can configure the amount of archives:

 
If you script out that change, you’ll end up with this script:

USE master
GO

EXEC xp_instance_regwrite
	N'HKEY_LOCAL_MACHINE',
	N'Software\Microsoft\MSSQLServer\MSSQLServer',
	N'NumErrorLogs',
	REG_DWORD,
	10
GO

 
So you can use the same method (but in reverse of course) to find the amount of configured logs. You just need to use the xp_instance_regread (blogged about this before) stored procedure:

DECLARE @NumErrorLogs int

EXEC master.dbo.xp_instance_regread
	'HKEY_LOCAL_MACHINE',
	'SOFTWARE\Microsoft\MSSQLServer\MSSQLServer',
	N'NumErrorLogs',
	@NumErrorLogs OUTPUT

SELECT @NumErrorLogs AS [NumberOfLogFiles]

 
There are 2 results possible from this:

– NULL: This means the setting is not configured, and the default (6 archives) is used
– An integer between 6 and 99: This means the amount of logs is configured and not set to default. The number that is returned is the amount of archives

 
This result can be used to configure the amount of inserts in the script posted above. If a NULL is returned, you know you need to use the default setting of 6. If an integer is returned, you can use that number to determine the amount of archives you’d like to use.

 
Conclusion
By retrieving the SQL Server error log with a T-SQL query, it’s easy to automate this process if needed. For example, you might want to store the error log data in another place, and not keep it stored in the log files on your instance. You can do that by modifying this script, or write your own solution. By using the xp_instance_regread and xp_readerrorlog or sp_readerrorlog, you can be sure your script doesn’t fail or misses some data.

Where to find backup- and restore history

Interesting questions don’t always need to be hard to answer. Earlier this week I was asked “How can I see what backup was restored on environment X?”. The answer is as interesting as the question…

 
Backup
To see what backups are taken, you can take a look at the history of your maintenance job (depending on whatever type you use in your environment). But there is another way: just ask SQL Server.

First, let’s create a new database named Sandbox (before you run the scripts in this blog post, change or create the directories that are used!):

USE master
GO

CREATE DATABASE Sandbox
	ON PRIMARY (NAME = N'Sandbox', FILENAME = N'C:\Database\Sandbox.mdf', SIZE = 4096KB, FILEGROWTH = 1024KB)
	LOG ON (NAME = N'Sandbox_log', FILENAME = N'C:\Database\Sandbox_log.ldf', SIZE = 1024KB, FILEGROWTH = 1024KB)
GO

 
Before we start to backup this database, I want to make sure the information I see is only for this backup. In order to do so, I’m going to cycle the SQL Server error log, and clean all my backup history from msdb. You can do that by running these statements:

EXEC sys.sp_cycle_errorlog
EXEC msdb.dbo.sp_delete_backuphistory @oldest_date =  '2016-01-01'

 
By setting the “@oldest_date” to a date in the future, all backup history will be deleted.

Now that we’ve done that, let’s take a backup of our database:

BACKUP DATABASE [Sandbox]
 TO DISK = N'C:\Database\Backup\Sandbox_2010828_0832.bak' WITH STATS = 10;
GO

 
The backup will complete in an instant, because there is no data to backup:

 
The result of the backup can be found in 2 locations. The first one is the SQL Server error log:

EXEC xp_readerrorlog

 
If you search in the error log you can find the exact time the backup was taken, the amount of pages that were processed, the location of the backup file, etc (image clickable for larger version):



 
The SQL Server error logs are also available from the SQL Server Management Studio (SSMS). If you connect to your instance with the object explorer, and navigate to Management -> SQL Server Logs, you can find the same information.

The second location you can use to retrieve this information is msdb. The msdb contains the history of your backups:

 
You can retrieve this information by running a query on the backup-tables:

SELECT
	B.user_name,
	B.database_name,
	M.physical_device_name,
	B.backup_start_date,
	B.backup_finish_date,
	CASE B.type
		WHEN 'D' THEN 'Database'
		WHEN 'L' THEN 'Log'
	END AS BackupType
FROM msdb.dbo.backupset B
INNER JOIN msdb.dbo.backupmediafamily M ON M.media_set_id = B.media_set_id

 
Restore
The information for restored backups can also be found in the same 2 locations. Let’s take a backup of the Sandbox database first:

USE master
GO

RESTORE DATABASE Sandbox
FROM DISK = N'C:\Database\Backup\Sandbox_2010828_0832.bak' WITH REPLACE, STATS = 10
GO

 
If we look at the SQL Server error log again we can find the restore time, used backup, etc:



 
We can also query the msdb again. But this time, we need to use the restore-tables:

SELECT
	destination_database_name,
	user_name,
	restore_date,
	destination_phys_name
FROM msdb.dbo.restorehistory H
INNER JOIN msdb.dbo.restorefile F ON F.restore_history_id = H.restore_history_id

 
This shows you all recent restores on your instance:

 
Conclusion
Being able to retrieve the information you need to determine which database was backed-up or restored can help you solve some weird issues. In my case, it explained why a certain record wasn’t restored on environment X: it was inserted after the backup was taken. Issues like that can be quite hard (or even impossible) to find without information about backups and restores.

Minimal permissions needed to run sp_WhoIsActive

If you’re running sp_WhoIsActive on your SQL Server instance (like I do on all my instances), it might be useful to make this tool available for your colleagues. They can use it to determine the workload on the server, or to see if the query they are running is blocking other processes.

This was the exact question I got this morning. One of the members of our BI team wanted to have permissions on sp_WhoIsActive, and I was struggling (for the thousandth time) to remember the minimal permissions I needed to give him. So I’m going to be a smart guy for once, and document it here. And hopefully this might help you as well (and save you the time I lost on it).

 
Permissions on the object
To test this, I’ve created a new login (TestLogin) with just read permissions on a single database. If you try to execute sp_WhoIsActive with those permissions, you’ll see this error message:

 
To fix that, go to the master database, add the login, and add execute permissions on the sp_WhoIsActive stored procedure:

 
You could also script this out like this:

USE master
GO
GRANT EXECUTE ON dbo.sp_WhoIsActive TO TestLogin
GO

 
Now that that’s fixed, you still can’t execute sp_WhoIsActive:

 
To fix that, you need to right-click on the instance name, go to properties -> Permissions, and add “View server state” permissions for the login:

 
Again, you can also do this via T-SQL:

USE master
GO
GRANT VIEW SERVER STATE TO TestLogin
GO

 
If you execute sp_WhoIsActive now, it works without any issues.

Obfuscating your data with SQL Server 2016 Dynamic Data Masking

In today’s IT landscape, data security is a big deal. Just ask people like Troy Hunt (Blog | @troyhunt), who is a security expert, how losing data can destroy your company. Companies all over the world lose data every single day, and you might not even be aware of this. I’m still seeing companies who give all IT employees admin permission on the whole IT infrastructure: from webserver to database servers, and from customer relations system to financial system. But why is it still a struggle for IT professionals to protect their data? In some companies they don’t even hash passwords, let alone protect them with a certificate or even heavier protection.

In this blog I’m not going to tell you how to secure your data, because honestly I’m not an expert at that. I do want to introduce you to a new way to obfuscate your data, that was introduced in SQL Server 2016. According to Wikipedia obfuscation means: “(…)the obscuring of intended meaning in communication, making the message confusing, willfully ambiguous, or harder to understand”. Or in easy terms: make it harder to read for your users and/or employees. Let’s take a look at an example.

If I would give you an email address like “JohnDoe@domain.com”, it’s really easy to guess the name of the person. Let’s be honest, it’s not rocket science right?! But what if I would give you the email address “Jo*****@d*****.com”, would you still guess the name correctly? Yes I agree with you there’s a small chance you might guess the name right, but those chances are slim. And instead of writing your own code to obfuscate data this way, with SQL Server 2016 you’ll get it out of the box.

Creating resources
The following is tested on SQL Server CTP 2.1. If you’re not running that version or higher, this might mean the scripts or test cases might not work the same as in my case.

In order to test this new feature, let’s create a new database first:

USE master
GO

CREATE DATABASE DynamicDataMasking ON PRIMARY 
(NAME = N'DynamicDataMasking', FILENAME = N'C:\Database\DynamicDataMasking.mdf')
LOG ON
(NAME = N'DynamicDataMasking_log', FILENAME = N'C:\Database\DynamicDataMasking_log.ldf')
GO

 
Now let’s create a table called Users:

CREATE TABLE dbo.Users
	(UserID INT IDENTITY(1,1),
	 FirstName VARCHAR(100),
	 LastName VARCHAR(100),
	 Address VARCHAR(100),
	 DateOfBirth DATE,
	 SocialSecurityNumber VARCHAR(15),
	 Email VARCHAR(100))

 
This table contains some basic information about the users of our application. And of course we need some user data in the table. I’ve used Redgate’s SQL Data Generator to generate data for my table. Because I’ve generated a full table, I’ve made the script available here.

 
Now that we have a table with sensitive data, let’s look at the ways of obfuscating this data. From SQL Server 2016 onwards you get to use the Dynamic Data Masking feature. This means you can add a “mask” on top of your data. This is done by altering the columns in your table:

ALTER TABLE <TableName>
ALTER COLUMN <ColumnName> ADD MASKED WITH (FUNCTION = '<FunctionName>')

 
This takes care of the data obfuscation for you, and you can determine the level of obfuscation by using specific functions:

Default: Completely replaces the value in the column with “xxxx”
Email: Replaces the email address with X’s, except for the first character, and the Top-level domain
Custom: The most dynamic of the 3 functions. This allows you to configure a prefix and suffix, and replaces the rest of the values with X’s

 
Let’s add masks to the columns in our table:

ALTER TABLE dbo.Users ALTER COLUMN FirstName ADD MASKED WITH (FUNCTION = 'default()')
ALTER TABLE dbo.Users ALTER COLUMN LastName ADD MASKED WITH (FUNCTION = 'partial(1,"XXXXXXXXXX",1)')
ALTER TABLE dbo.Users ALTER COLUMN Address ADD MASKED WITH (FUNCTION = 'partial(10, "XXXXX", 0)')
ALTER TABLE dbo.Users ALTER COLUMN City ADD MASKED WITH (FUNCTION = 'partial(3, "XXXXX", 0)')
ALTER TABLE dbo.Users ALTER COLUMN State ADD MASKED WITH (FUNCTION = 'partial(4, "XXXXX", 0)')
--Country no masking
ALTER TABLE dbo.Users ALTER COLUMN DateOfBirth ADD MASKED WITH (FUNCTION = 'default()')
ALTER TABLE dbo.Users ALTER COLUMN SocialSecurityNumber ADD MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)')
ALTER TABLE dbo.Users ALTER COLUMN Email ADD MASKED WITH (FUNCTION = 'email()')

 
Testing data masking
Now that we have a table with masked data, how does that look from different perspectives? Let’s create 2 users: an application login with just read permissions, and an admin login with db_owner permissions:

USE master
GO

CREATE LOGIN [AppLogin]
	WITH PASSWORD = 'TestPass',
	--SID = 0xDB9ED49EF06F3A40B5024B36F5CDD658,
	DEFAULT_DATABASE = [master],
	CHECK_POLICY = OFF,
	CHECK_EXPIRATION = OFF
GO

CREATE LOGIN [AdminLogin]
	WITH PASSWORD = 'TestPass',
	--SID = 0xDB9ED49EF06F3A40B5024B36F5CDD658,
	DEFAULT_DATABASE = [master],
	CHECK_POLICY = OFF,
	CHECK_EXPIRATION = OFF
GO


USE DynamicDataMasking
GO

CREATE USER AppLogin FOR LOGIN AppLogin WITH DEFAULT_SCHEMA = dbo
GO

ALTER ROLE db_datareader ADD MEMBER AppLogin
GO

CREATE USER AdminLogin FOR LOGIN AdminLogin WITH DEFAULT_SCHEMA = dbo
GO

ALTER ROLE db_owner ADD MEMBER AdminLogin
GO

 
Now open up 2 new SSMS tabs, and log in with the 2 accounts. In both sessions run the query below:

USE DynamicDataMasking
GO

SELECT * FROM dbo.Users

 
The difference is clearly visible when you select the top 10 records in both sessions:

 
In the admin connection you get to see the full dataset, and in the application login the masked dataset is returned.

 
Querying data
The retrieval and querying of data isn’t changed when you have data masks applied. If you run a query where you search for users with a specific first name, the correct results is returned:

SELECT *
FROM dbo.Users
WHERE FirstName = 'Jeffrey'

 
Security issues
Now we get to the interesting part: the security issues. Even though this is not a form of data encryption, I did expect this could be a part of data security. But unfortunately this was a false sense of security…

Let’s try a normal query on the masked data, with the AppLogin session:

USE DynamicDataMasking
GO

SELECT * FROM dbo.Users

 
This returns the data as expected: masked. But is there any way around this? Let’s make the AppLogin a db_owner on another database (run this statement with a user that has permissions to do this):

USE Sandbox
GO

CREATE USER AppLogin FOR LOGIN AppLogin WITH DEFAULT_SCHEMA = dbo
GO

ALTER ROLE db_owner ADD MEMBER AppLogin
GO

 
So the AppLogin is still db_datareader on the DynamicDataMasking database, and db_owner on the Sandbox database. Now let’s run the query below:

USE Sandbox
GO

SELECT *
FROM DynamicDataMasking.dbo.Users

 
Well, that’s awkward! If your application login has db_owner permissions on another database, you can query the data from the users table without the data masks!

But I know what you’re thinking. My application logins aren’t db_owner anywhere so I’m safe. WRONG! Let me show you another example:

USE DynamicDataMasking
GO

SELECT
	U1.*
FROM dbo.Users U1
INNER JOIN dbo.Users U2 ON U2.USerID = U1.UserID

 
Well hello precious data! If you just join a table to ANY other table (or join it to itself), all data is returned without masking.

This must be all right? There’s one last thing…

Let’s try to insert the masked data into a temp table, and select that result:

USE DynamicDataMasking
GO

SELECT *
INTO #UsersTemp
FROM dbo.Users

SELECT *
FROM #UsersTemp

DROP TABLE #UsersTemp

 
I’m not sure what the idea behind of this feature was, or how the SQL Server designed it, but this sounds like a real security issue to me. I expect my data to be obfuscated for my application users, but it’s the complete opposite. If you have a direct connection to the database (and not through an application only) it’s really easy to get a hold of the unmasked data.

 
Conclusion
Even though I really like this feature (and yes, I’m serious!) there are some things the SQL Server team needs to look at. If you’re expecting your data to be masked for all non-admin users, you’re in for a nasty surprise.

Hoping this will be fixed soon, I’ve created a Connect item for this. If you agree with me on this, and you think it’s a problem as well, please up-vote it or leave a comment in the connect item. There is also a specific item for the join issue. Let’s make sure the SQL Server team is aware of these issues.

Follow

Get every new post delivered to your Inbox.

Join 75 other followers