Obfuscating your data with SQL Server 2016 Dynamic Data Masking

In today’s IT landscape, data security is a big deal. Just ask people like Troy Hunt (Blog | @troyhunt), who is a security expert, how losing data can destroy your company. Companies all over the world lose data every single day, and you might not even be aware of this. I’m still seeing companies who give all IT employees admin permission on the whole IT infrastructure: from webserver to database servers, and from customer relations system to financial system. But why is it still a struggle for IT professionals to protect their data? In some companies they don’t even hash passwords, let alone protect them with a certificate or even heavier protection.

In this blog I’m not going to tell you how to secure your data, because honestly I’m not an expert at that. I do want to introduce you to a new way to obfuscate your data, that was introduced in SQL Server 2016. According to Wikipedia obfuscation means: “(…)the obscuring of intended meaning in communication, making the message confusing, willfully ambiguous, or harder to understand”. Or in easy terms: make it harder to read for your users and/or employees. Let’s take a look at an example.

If I would give you an email address like “JohnDoe@domain.com”, it’s really easy to guess the name of the person. Let’s be honest, it’s not rocket science right?! But what if I would give you the email address “Jo*****@d*****.com”, would you still guess the name correctly? Yes I agree with you there’s a small chance you might guess the name right, but those chances are slim. And instead of writing your own code to obfuscate data this way, with SQL Server 2016 you’ll get it out of the box.

Creating resources
The following is tested on SQL Server CTP 2.1. If you’re not running that version or higher, this might mean the scripts or test cases might not work the same as in my case.

In order to test this new feature, let’s create a new database first:

USE master
GO

CREATE DATABASE DynamicDataMasking ON PRIMARY 
(NAME = N'DynamicDataMasking', FILENAME = N'C:\Database\DynamicDataMasking.mdf')
LOG ON
(NAME = N'DynamicDataMasking_log', FILENAME = N'C:\Database\DynamicDataMasking_log.ldf')
GO

 
Now let’s create a table called Users:

CREATE TABLE dbo.Users
	(UserID INT IDENTITY(1,1),
	 FirstName VARCHAR(100),
	 LastName VARCHAR(100),
	 Address VARCHAR(100),
	 DateOfBirth DATE,
	 SocialSecurityNumber VARCHAR(15),
	 Email VARCHAR(100))

 
This table contains some basic information about the users of our application. And of course we need some user data in the table. I’ve used Redgate’s SQL Data Generator to generate data for my table. Because I’ve generated a full table, I’ve made the script available here.

 
Now that we have a table with sensitive data, let’s look at the ways of obfuscating this data. From SQL Server 2016 onwards you get to use the Dynamic Data Masking feature. This means you can add a “mask” on top of your data. This is done by altering the columns in your table:

ALTER TABLE <TableName>
ALTER COLUMN <ColumnName> ADD MASKED WITH (FUNCTION = '<FunctionName>')

 
This takes care of the data obfuscation for you, and you can determine the level of obfuscation by using specific functions:

Default: Completely replaces the value in the column with “xxxx”
Email: Replaces the email address with X’s, except for the first character, and the Top-level domain
Custom: The most dynamic of the 3 functions. This allows you to configure a prefix and suffix, and replaces the rest of the values with X’s

 
Let’s add masks to the columns in our table:

ALTER TABLE dbo.Users ALTER COLUMN FirstName ADD MASKED WITH (FUNCTION = 'default()')
ALTER TABLE dbo.Users ALTER COLUMN LastName ADD MASKED WITH (FUNCTION = 'partial(1,"XXXXXXXXXX",1)')
ALTER TABLE dbo.Users ALTER COLUMN Address ADD MASKED WITH (FUNCTION = 'partial(10, "XXXXX", 0)')
ALTER TABLE dbo.Users ALTER COLUMN City ADD MASKED WITH (FUNCTION = 'partial(3, "XXXXX", 0)')
ALTER TABLE dbo.Users ALTER COLUMN State ADD MASKED WITH (FUNCTION = 'partial(4, "XXXXX", 0)')
--Country no masking
ALTER TABLE dbo.Users ALTER COLUMN DateOfBirth ADD MASKED WITH (FUNCTION = 'default()')
ALTER TABLE dbo.Users ALTER COLUMN SocialSecurityNumber ADD MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)')
ALTER TABLE dbo.Users ALTER COLUMN Email ADD MASKED WITH (FUNCTION = 'email()')

 
Testing data masking
Now that we have a table with masked data, how does that look from different perspectives? Let’s create 2 users: an application login with just read permissions, and an admin login with db_owner permissions:

USE master
GO

CREATE LOGIN [AppLogin]
	WITH PASSWORD = 'TestPass',
	--SID = 0xDB9ED49EF06F3A40B5024B36F5CDD658,
	DEFAULT_DATABASE = [master],
	CHECK_POLICY = OFF,
	CHECK_EXPIRATION = OFF
GO

CREATE LOGIN [AdminLogin]
	WITH PASSWORD = 'TestPass',
	--SID = 0xDB9ED49EF06F3A40B5024B36F5CDD658,
	DEFAULT_DATABASE = [master],
	CHECK_POLICY = OFF,
	CHECK_EXPIRATION = OFF
GO


USE DynamicDataMasking
GO

CREATE USER AppLogin FOR LOGIN AppLogin WITH DEFAULT_SCHEMA = dbo
GO

ALTER ROLE db_datareader ADD MEMBER AppLogin
GO

CREATE USER AdminLogin FOR LOGIN AdminLogin WITH DEFAULT_SCHEMA = dbo
GO

ALTER ROLE db_owner ADD MEMBER AdminLogin
GO

 
Now open up 2 new SSMS tabs, and log in with the 2 accounts. In both sessions run the query below:

USE DynamicDataMasking
GO

SELECT * FROM dbo.Users

 
The difference is clearly visible when you select the top 10 records in both sessions:

 
In the admin connection you get to see the full dataset, and in the application login the masked dataset is returned.

 
Querying data
The retrieval and querying of data isn’t changed when you have data masks applied. If you run a query where you search for users with a specific first name, the correct results is returned:

SELECT *
FROM dbo.Users
WHERE FirstName = 'Jeffrey'

 
Security issues
Now we get to the interesting part: the security issues. Even though this is not a form of data encryption, I did expect this could be a part of data security. But unfortunately this was a false sense of security…

Let’s try a normal query on the masked data, with the AppLogin session:

USE DynamicDataMasking
GO

SELECT * FROM dbo.Users

 
This returns the data as expected: masked. But is there any way around this? Let’s make the AppLogin a db_owner on another database (run this statement with a user that has permissions to do this):

USE Sandbox
GO

CREATE USER AppLogin FOR LOGIN AppLogin WITH DEFAULT_SCHEMA = dbo
GO

ALTER ROLE db_owner ADD MEMBER AppLogin
GO

 
So the AppLogin is still db_datareader on the DynamicDataMasking database, and db_owner on the Sandbox database. Now let’s run the query below:

USE Sandbox
GO

SELECT *
FROM DynamicDataMasking.dbo.Users

 
Well, that’s awkward! If your application login has db_owner permissions on another database, you can query the data from the users table without the data masks!

But I know what you’re thinking. My application logins aren’t db_owner anywhere so I’m safe. WRONG! Let me show you another example:

USE DynamicDataMasking
GO

SELECT
	U1.*
FROM dbo.Users U1
INNER JOIN dbo.Users U2 ON U2.USerID = U1.UserID

 
Well hello precious data! If you just join a table to ANY other table (or join it to itself), all data is returned without masking.

This must be all right? There’s one last thing…

Let’s try to insert the masked data into a temp table, and select that result:

USE DynamicDataMasking
GO

SELECT *
INTO #UsersTemp
FROM dbo.Users

SELECT *
FROM #UsersTemp

DROP TABLE #UsersTemp

 
I’m not sure what the idea behind of this feature was, or how the SQL Server designed it, but this sounds like a real security issue to me. I expect my data to be obfuscated for my application users, but it’s the complete opposite. If you have a direct connection to the database (and not through an application only) it’s really easy to get a hold of the unmasked data.

 
Conclusion
Even though I really like this feature (and yes, I’m serious!) there are some things the SQL Server team needs to look at. If you’re expecting your data to be masked for all non-admin users, you’re in for a nasty surprise.

Hoping this will be fixed soon, I’ve created a Connect item for this. If you agree with me on this, and you think it’s a problem as well, please up-vote it or leave a comment in the connect item. There is also a specific item for the join issue. Let’s make sure the SQL Server team is aware of these issues.

Time traveling with SQL Server 2016: Temporal tables

Since the release of the first publicly available SQL Server 2016 CTP2, we have a whole list of new and cool features we can start using. One of those features is called temporal tables (or system-versioned tables). So what are temporal tables, and how can you use them?

 
What is a temporal table?
To understand the functionality of temporal tables, we first need to agree on the definition of a table in SQL Server. If you run a query on a table that contains data, that query can only return the current version of a record. This means you can only see the current “truth”, and there’s no way to travel back in time to see older versions of this record.

Temporal tables give you the opportunity to time travel in SQL Server. For every data change (Update, Delete and Merge) in your table, a historical copy is stored in a history table. The table on which you enabled this data tracking is changed to a “System-Versioned” table.

 
How does it work?
For every update or delete on a table where system_versioning is enabled, a historical copy of the record is stored:

The original image can be found in this blog post, written by Manoj Pandey (Blog | @manub22).

 
But how does that work? Let’s start by creating a table we will use to test system-versioned tables:

CREATE TABLE dbo.BankAccount
	(AccountNumber INT CONSTRAINT PK_BankAccount_TransactionID PRIMARY KEY,
	 AccountBalance FLOAT);

 
In order to make a table a system-versioned table, we need to add 2 datetime2 columns to our newly created table. SQL Server will use these columns to store a from- and to-date that will indicate the valid period of time of the record. By telling SQL Server to use these 2 columns with the keywords “PERIOD FOR SYSTEM_TIME”, SQL Server knows that it needs to use these columns when querying data from the history table:

ALTER TABLE dbo.BankAccount
ADD ValidFrom datetime2 GENERATED ALWAYS AS ROW START HIDDEN NOT NULL,
	ValidTo datetime2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL,
	PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo);

 

In the above script, the “HIDDEN” keyword is used. This can only be used from CTP 2.1 and higher. What HIDDEN does, is hide the columns in your original table to make sure application- and database code isn’t going to fail because you added these 2 new columns. You can still query the columns, but if you run a “SELECT *” query these columns won’t be returned.

 
When that’s done we can enable system-versioning on the table:

ALTER TABLE dbo.BankAccount
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory));

 
Data changes & history
Now we can take a look at how data changes are captured, and history is recorded. Before you execute any of the statements below, enable the execution plan in your SSMS. If we insert a new record, no history is created:

INSERT INTO dbo.BankAccount
	(AccountNumber, AccountBalance)
VALUES
	(2147483647, 10000)

 
We just see the expected Clustered Index Insert, nothing special:

 
Now, let’s update the bank account balance (the WAITFOR is added just to make sure we have a clear difference in change time for the next few steps):

UPDATE dbo.BankAccount
SET AccountBalance = AccountBalance - 2500

WAITFOR DELAY '00:01:00'

UPDATE dbo.BankAccount
SET AccountBalance = AccountBalance + 1000

 
If you look at your execution plan, you’ll see an extra DML command in the execution plan:

 
Besides the expected Clustered Index Update, there is an extra Clustered Index Insert. This is the insert statement that is generated by SQL Server, to insert a record in the history table. There’s no user interaction needed to store historical data. SQL Server will take care of that process for you.

 
Time travelling
Now that we store historical data, it’s time to start the actual time travelling. To paint a complete picture, this is a timeline of the data inserts and updates:

 
Now that we have a clear timeline, let’s start our time travel. Besides creating and maintaining the history table, you also get to use a bit of new syntax in your query. For system-versioned tables, you can use the “FOR SYSTEM_TIME” clause in your query. In order to show you the timeline in SQL Server, let’s use this new syntax to query our history per minute:

SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:42:00' --No data available
SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:43:00' --First insert
SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:44:00' --First update
SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:45:00' --Second update
SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:46:00' --No other changes

 
Remember, this is based on my data. So if you want to query your timeline, change the datetime string based on your data!

So basically, you can start querying your data by using a specific point in time:

DECLARE @PointInHistory DATETIME2
SET @PointInHistory = '2015-07-09 09:45:00'

SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF @PointInHistory

 
Besides the actual point in time, you can use 3 other ways to query your data:

SELECT AccountNumber, AccountBalance, ValidFrom, ValidTo
FROM dbo.BankAccount
FOR SYSTEM_TIME FROM '2015-07-09 09:43:00' TO '2015-07-09 09:45:00'


SELECT AccountNumber, AccountBalance, ValidFrom, ValidTo
FROM dbo.BankAccount
FOR SYSTEM_TIME BETWEEN '2015-07-09 09:43:00' AND '2015-07-09 09:45:00'


SELECT AccountNumber, AccountBalance, ValidFrom, ValidTo
FROM dbo.BankAccount
FOR SYSTEM_TIME CONTAINED IN ('2015-07-09 09:43:00', '2015-07-09 09:45:00')

 
Table changes
In practice you’ll not only see data changes happening on tables, but you’ll also have schema changes on your table once in a while. But how does a temporal table cope with schema changes? Let’s try to add a column to our table:

ALTER TABLE dbo.BankAccount ADD NewColumn VARCHAR(10)

This results in an error:

 
In order to add a column, we need to disable the system-versioning first, before we add the new column:

ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = OFF)
GO

ALTER TABLE dbo.BankAccount ADD NewColumn VARCHAR(10)
GO

 
Now let’s enable system-versioning again:

ALTER TABLE dbo.BankAccount
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory))
GO

 
This results in another error:

 
So we first need to add the same column to the history table, before enabling system-versioning again:

ALTER TABLE dbo.MyBankAccountHistory
ADD NewColumn VARCHAR(10)
GO

ALTER TABLE dbo.BankAccount
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory))
GO

 

Disabling system-versioning means that for a short period of time, you could loose historical data. A possible way to get around this, is by setting your database in single user mode before you disable it. This prevents other user settings from changing data in that table. If you run this in your production environment, this might not be your best option though! By disabling system-versioning, historical data will not be removed.

 
Indexing historical data
Performance isn’t only important when you query your “normal” tables, but also if you want to query your newly created temporal tables. If you want to write a business process that uses your historical data, you need to make sure this doesn’t slow down your application. But how do you achieve that?

Just like on a normal table, you can create indexes no the history table. This table is basically a normal table, so there’s no reason why you can’t create indexes on it:

CREATE NONCLUSTERED INDEX IDX_MyHistoricalData
ON dbo.MyBankAccountHistory (AccountNumber, AccountBalance)

 
Historical data cleanup
It’s really great to store historical versions of your records in a separate table, but this also provides you with new challenges. You need more storage for your data, and you might want to remove data after a certain period of time. At this point in time, system-versioned tables don’t have a cleanup process. This might change in the future, but right now you need to cleanup these tables yourself. So if you want (or need) to cleanup old data, you need to write a small process yourself. This can be as easy as writing a query to run through all system-versioned tabled and delete old data, and schedule that in a SQL Server Agent job. But again, this is something to keep in mind when you start working with temporal tables.

I had a really good conversation about this with Borko Novakovic (@borko_novakovic), who is the Program Manager for this part of SQL Server. The code below is an example of such a process, that Borko provided:

BEGIN TRAN 

    /*Drop versioning to allow deletion of historical data*/
       ALTER TABLE dbo.BankAccount
              SET (SYSTEM_VERSIONING = OFF)
 
       /*DELETE MyBankAccountHistory table*/
       DELETE
       FROM dbo.MyBankAccountHistory
       WHERE ValidTo < '2015.01.01'
       
       /*Re-establish versioning*/
       ALTER TABLE dbo.BankAccount
              SET (SYSTEM_VERSIONING = ON
                  (HISTORY_TABLE  = dbo.MyBankAccountHistory,
                   DATA_CONSISTENCY_CHECK = OFF));

COMMIT

 
Deleting data without disabling system-versioning doesn’t work, because you’ll get an error like this:

 
I do want to thank Borko again for his time. He helped me out a lot, and I’m thankful he took the time to do that. Thanks Borko!!!

 
Remove system-versioning
Removing system-versioned tables from your systems takes one extra step, because just executing a “DROP TABLE” statement doesn’t work. SQL Server will prevent you from dropping temporal tables, without you first disabling it. Let’s start by disabling system-versioning:

ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = OFF)
GO

 
This disables the system-versioning on your original table, and transforms the history table to a normal table. If you want to get rid of the tables, you can just drop them both like you would normally do:

DROP TABLE dbo.BankAccount
GO
DROP TABLE dbo.MyBankAccountHistory
GO

 
Resources
Besides this blog, there are 2 fantastic posts you need to read about this topic, written by Itzik Ben-Gan (Blog | @ItzikBenGan), that you can read at SQLMag.com (Part 1, Part 2).

One thing you definitely MUST SEE is a video with Borko and Scott Klein (Blog | @SQLScott) on Channel 9. In this video, Borko explains the idea behind temporal tables, and shares some insights on this new feature.

Other information on temporal tables can be found on MSDN:

Temporal Tables
sys.tables
SQL Server 2016 (MSDN root)

The curious math problem in SQL Server

Yesterday, I came across a curious problem in SQL Server. I tried to calculate something, and it threw a strange exception I’ve never seen before. After trying it several times (just to make sure I wasn’t going crazy) and after asking some colleagues to check it out (thanks again guys, for helping me!), I decided it was weird enough to post it on twitter with the hashtag #SQLHelp, hoping I would get some help:

 
The help I needed came from Pieter Vanhove (Blog | @Pieter_Vanhove) and Kenneth Fisher (Blog | @sqlstudent144), who were kind enough to try it out on their SQL Servers. Both couldn’t reproduce the issue. After trying some different things, I came to the conclusion that I couldn’t reproduce it myself on another instance! So it must be an instance- or database specific issue…

 
Taking the bull by the horns
After leaving the problem alone for an evening, I came into the office the next morning, and decided to take the bull by the horns. The first thing I did was dive into the problem again. And yes, it was still there. After trying to find some workarounds (explicitly casting all objects/properties to the same data type and length/precision, try implicit conversions, etc), I still faced this weird problem. Then I tried to reproduce it on another instance. The query executed without a problem… Okay, now it’s getting weird!

The first thing I did was try and create a script that could be used to reproduce it. The query I ended up with is this script:

DECLARE @temp TABLE
  (size INT)


INSERT INTO @temp
  (size)
VALUES
  (707072),
  (1024000)


SELECT
  (100 / SUM((((size) * 8.00) / 1024)))
FROM @temp AS T

 
Running that query showed me the exception every time I executed it on the first instance. Executing the same piece of code on another database on the same instance (yes, this small thing took me a while to figure out!) wasn’t a problem. So then I knew it’s was database problem.

Checking the settings
To compare database settings, I used the catalog view (or system view) sys.databases:

SELECT *
FROM sys.databases
WHERE name IN ('Sandbox', 'Sandbox2')

 
There were only 2 settings that could pose a problem: ARITHABORT and NUMERIC_ROUNDABORT. After a short search, I found out these could also be set in your query (for just that session). Let’s try to set these options ON and OFF in the query:

SET ARITHABORT ON;
SET NUMERIC_ROUNDABORT OFF;



DECLARE @temp TABLE
  (size INT)


INSERT INTO @temp
  (size)
VALUES
  (707072),
  (1024000)


SELECT
  (100 / SUM((((size) * 8.00) / 1024)))
FROM @temp AS T

 
The query still works, and the result is returned in the correct way. Let’s try and set the other option to ON:

SET ARITHABORT ON;
SET NUMERIC_ROUNDABORT ON;



DECLARE @temp TABLE
  (size INT)


INSERT INTO @temp
  (size)
VALUES
  (707072),
  (1024000)


SELECT
  (100 / SUM((((size) * 8.00) / 1024)))
FROM @temp AS T

 
BINGO!!! The query doesn’t work anymore, and you see the strange exception again. So why is this an issue?

According to MSDN, NUMERIC_ROUNDABORT “Specifies the level of error reporting generated when rounding in an expression causes a loss of precision.”. So this means that every time you calculate something (divide or multiply), and the precision might change, this throws an exception.

When trying to find out why you would use it, I found a possible explanation in the book “Pro SQL Server 2012 Relational Database Design and Implementation” by Louis Davidson (Blog | @drsql) and Jessica Moss (Blog | @jessicammoss):

There is a setting -SET NUMERIC_ROUNDABORT ON- that causes an error to be generated when a loss of precision would occur from an implicit data conversion. (…) SET NUMERIC_ROUNDABORT can be quite dangerous to use and might throw off applications using SQL Server if set to ON. However, if you need guaranteed prevention of implicit round-off due to system constraints, it’s there.

And in my case, this setting was set to ON on the database level, and only for a single database on the instance.

 
Using the MSDN “workaround”
When you read the Remarks section of the MSDN article, you get an exception when you set ARITHABORT and NUMERIC_ROUNDABORT both to ON. But if you set ARITHABORT to ON, and NUMERIC_ROUNDABORT to OFF, you should get a warning, and the resultset should be NULL. After trying that out, I noticed that doesn’t work. There’s no case possible where you get a warning and no exception. Even the code posted on that same page (in the Examples section) doesn’t work:

-- SET NOCOUNT to ON, 
-- SET NUMERIC_ROUNDABORT to ON, and SET ARITHABORT to ON.
SET NOCOUNT ON;
PRINT 'SET NUMERIC_ROUNDABORT ON';
PRINT 'SET ARITHABORT ON';
SET NUMERIC_ROUNDABORT ON;
SET ARITHABORT ON;
GO
DECLARE @result DECIMAL(5, 2),
   @value_1 DECIMAL(5, 4), 
   @value_2 DECIMAL(5, 4);
SET @value_1 = 1.1234;
SET @value_2 = 1.1234 ;
SELECT @result = @value_1 + @value_2;
SELECT @result;
GO

-- SET NUMERIC_ROUNDABORT to ON and SET ARITHABORT to OFF.
PRINT 'SET NUMERIC_ROUNDABORT ON';
PRINT 'SET ARITHABORT OFF';
SET NUMERIC_ROUNDABORT ON;
SET ARITHABORT OFF;
GO
DECLARE @result DECIMAL(5, 2),
   @value_1 DECIMAL(5, 4), 
   @value_2 DECIMAL(5, 4);
SET @value_1 = 1.1234;
SET @value_2 = 1.1234 ;
SELECT @result = @value_1 + @value_2;
SELECT @result;
GO

 
So to me, this seems like a really niche feature. And when I see the demo code on MSDN isn’t working as expected, and there are no comments posted about this, my careful conclusion is that this isn’t a heavily used feature.

 
It’s always the next morning…
To be honest, I’ve never seen this setting being used in practice, and I still don’t see the benefits of using it after I’ve encountered this issue. But in this case, it was a 3rd party vendor application, and I wasn’t aware of the rather “special” kind of configuration. But I’m glad I stepped away from the problem for a while, and looked at it the next morning again. It’s always the next morning that you solve issues like this, isn’t it? :)

Source control: Using Visual Studio Online in SSMS

As database professionals, we’re all aware of the importance of backups. We make sure the backup process runs as expected, and (hopefully) we restore a backup every now and then to test if the restore process works. But what about source control for your scripts? Most of us don’t use source control in our daily job. But source control is a much a part of the backup process as the actual database backups.

So looking for an easy way out, I focused on Visual Studio Online (VS Online). This is a free online source control system, that you can use once you created an account. You can login on the website with your Microsoft Live account, enter some information (like a username, etc), and you’re ready to go!

But how do you go from writing a query in SQL Server Management Studio (SSMS), to checking in your .sql code files into VS Online? Let’s take a look.

 
Installing TFS tools
In order to use VS Online, you need to install some extra things on your machine. The first thing you need to download is the “Team Explorer for Microsoft Visual Studio 2013″. This installs the team explorer, that you need to get things from and check thing into VS online. This install requires a reboot, so please take that into account!

The second and last installation is the “Microsoft Visual Studio Team Foundation Server 2013 MSSCCI Provider”. This installs the provider, that takes care of the communication with VS Online.

For this installation, it’s important that you download the right version. If you’re running the 32-bit version of SSMS, you need to download the 32-bit version of the provider. If you install the 64-bit version, it won’t work (and trust me on this, I’ve made that mistake before!).

 
VS Online
At this point, I’m assuming you have an account for VS Online. If you log in to the website, and you go to your account url ([AccountName].visualstudio.com), you see the “Create your first team project” page. On this page, you need to create a project before you can check in any files:

 
Configure source control in SSMS
The next step is to configure SSMS, to use the source control provider you just installed. After the installation, you get a new menu in SSMS for source control:

In this menu, click on “Open from Source Control”. In the window that opens, click “Servers…”:

Click “Add…” to add a TFS server or URL:

Now you need to enter your account URL in the textbox:

Once you’ve done that a login screen pops up. Log in with your Microsoft Live account (the same you used to create your VS Online profile), and you’re authenticated:

At this point, your source control is added, and you’ll see it in the overview screen:

If you close the windows that are shown, you end up in your source control project overview:

 
Creating a SQL Server Scripts project/solution
Now that we’ve installed the source control providers, it’s time to create our SQL Scripts project. In SSMS, click on File => New => Project (or Ctrl+Shift+N), and choose “SQL Server Scripts”:

Now that you have a project, it might be handy to show what’s in the solution (if you don’t have that on your screen already). You can open the “Solution Explorer” by clicking on View => Solution Explorer (or Ctrl+Alt+L).

 
Checking in your solution
If you want to check-in your solution or project, you can right-click it in the Solution Explorer, and choose “Add Solution to Source Control”:

 
Another way of doing it, is by clicking on File => Source Control => Add Solution to Source Control. In the next window you are asked for the source control server (click okay if your VS Online URL is selected), and you see the project explorer:

Click on “MySQLProject”, and click OK. Now the check-in screen pops up, where you can “tag” your check-in:

Congratulation! You just checked in your first code in VS Online!

 
Pitfalls and difficulties
In all honesty, there are some downsides in using this, but those are mainly issues in SSMS. For example, you can’t create folders in your SQL Script project. You have to work with the default “Queries” folder you get by default. So there’s no chance you can add a “Finance”, “Maintenance” or “DBA” folder to your project for example.

A work-around for this is creating multiple projects in the solution. So you’ll end up with 3 projects in 1 solution. The nice thing is that you can check-in or check-out 1 single project in your solution, or you can just work on the entire solution at once. This gives you the possibility to work on the solution on your own, or with a group of people.

Another pitfall is the process. If you’re not used to working with queries in source control, it’s easy to forget to check-in your changes at the end of the day. But that’s just something you need to get used to, and need to deal with in your own way. If you’d like to check-in multiple times a day, go ahead and do that. Do you want to check-in your changes at the end of the day only, it’s fine as well.

 
Conclusion
Using source control is great, especially when you have a lot of scripts you need to maintain. I’ve tried to create zip-archives, version numbering my .sql files, different directories, etc. But there’s always a moment that you forget to save your script in an archive, or your computer crashes, and all your work is gone. Using source control prevents these issues (if you use it as intended of course!). VS Online is a really good source control platform (it’s basically TFS online), and it’s free for use. One of the mayor advantages is that VS Online / TFS is fully compatible with SSMS and Visual Studio (which you both use as database professional).

Are you still skeptical? Maybe you should just try it out for a week, to see if it helps you in your daily job. And you know what, maybe you’ll like it. And even better: maybe it’ll someday save your life!

T-SQL Tuesday #66 – Monitoring

T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.

This month the subject is “Monitoring”. If you want to read the opening post, please click the image below to go to the party-starter: Cathrine Wilhelmsen (Blog | @cathrinew).



 
Monitoring the performance or health of your environment might be the single most important thing in your daily job. But this might be the most important question you need to answer, before you start working on any monitoring or process. You don’t want to find out that your database isn’t working anymore, because someone from the business starts calling you his application isn’t working anymore. You want to know it as soon as possible, so when the business calls you, you can tell them you’re already working on it, or maybe even solved it. But how do you do that?

 
What do you need to monitor?
There are many different things you can and want to monitor, that’s why it’s difficult to explain the different aspects of monitoring. You don’t only want to monitor SQL Server, but also the server it’s running on. If a disk fills up, SQL Server will eventually stop working. But do you want to find out that you have a problem when your SQL Server stops working? I don’t think so…

But even within SQL Server, there are so much things you can monitor, depending on your needs. A number of these things are explained by Kevin Kline (Blog | @kekline) in his SQLBits session: “An End-to-End Troubleshooting Checklist for SQL Server”. He talks about troubleshooting SQL Server, but he mentions a lot of stuff you can start monitoring today. So if this will ever cause problems in the future, you can always look back at the history of a specific setting or configuration. You don’t want to spend time on searching for a previous setting value, when your SQL server isn’t working anymore.

 
Different ways of monitoring
There are different ways of monitoring your environment. Some parts you want to monitor with an automatic monitoring solution, that raises errors when you need to take action. But there are also parts of your environment, that you might want to monitor “manually”. Let me explain that.

 
SCOM
One of the corporate monitoring solutions you could use is System Center (SCOM). This gives you the oppertunity to monitor your whole environment from top to bottom: from server to specific disk. The downside is that it might be pretty expensive when you compare it to your other options.

And SCOM might be to overwhelming for your environment or process. Dealing with SCOM means you need a certain level of knowledge of your environment, and you need to know how to configure SCOM. This learning curve might be too much, when you just want to monitor a specific part of your environment or server.

 
SQL Sentry

SQL Sentry has proven itself as a tool provider with tools like SQL Sentry Plan Explorer. Personally, I use this tool every time I need to do some performance tuning. We’ve blogged about this tool with the SQLCoop team. It’s also featured on the SQL Sentry Plan Explorer page (which I’m very proud of!).

But besides that awesome tool, SQL Sentry is a company that built a number of monitoring solutions for your SQL Server. These can be found on their website. These are created specifically for monitoring SQL Server, and is a better choice compared to SCOM, if you just want to monitor SQL Server. You can use this if you want to monitor your whole SQL Server environment, with a tool build specifically for SQL Server.

 
Manual monitoring
Let’s say you have a production SQL Server instance. On this instance, you run a number of SQL Server Agent jobs. These jobs run on a specific schedule, and you want to monitor the outcome of those jobs. But perhaps you don’t need to take action when a specific job fails. If this is the case, using a big solution like SCOM for example, might be a bit too much.

Having a manual tool to check on job outcome and run time might be a better solution for you. If you want to do that, the SQL Server Agent job history window in SQL Server Management Studio (SSMS) might be the right tool for you. There’s no need for big tools, if your monitoring needs to be as basic as that.

Another tool you could use is SQLjobvis. I’ve mentioned this tool before, and I still use it almost on a daily basis. This can be used to monitor your environment “manually”.

A failed job can also send out a notfication by mail. This can be configured in the SQL Server Agent. This can be used to monitor your job activity as well.

 
Creating your own monitoring solution?
In the years I’ve worked in IT, I’ve seen a lot of companies and different monitoring systems. And there’s always that one company, that want to create its own SCOM implementation. They think the rest of the world is crazy, and that they can build a system better that’s better than everything else. Let me tell you, all custom build SCOM systems I’ve seen so far never lasted longer than a year. After 12 months, they use either another system on the side to fill up the gaps the custom SCOM system has left, or the custom SCOM system is gone entirely.

Maybe you can build a custom monitoring system for your business-specific processes (like checking the outcome of a webservice call), that could work. But when you’re building your own system to check the free space of a server, sorry to burst your bubble, you’re doing it wrong! Do you really think that you can build a custom system that can beat another monitoring solution that has evolved in 10+ years, and that has been tested in thousands of companies? I need to come across a system like that yet…

 
Conclusion
Looking at the different ways to monitor your environment, and the tools you can use for that, we can only conclude you have a world full of options. But in the end, you just fall back to 1 basic rule: if you monitor your environment, you need to act upon it as well. If a SQL Server Agent job fails, don’t just notice it and carry on your job, but act on it! Fix the job, or fix the underlying issue.

It’s basically the same as your backup process: when you’re taking backups of your database, and you never try to see if you can restore it, just stop taking backups! You can test, monitor and backup your entire environment, but if you don’t act on any issues, it’s all a waste of time. And I’ve seen that happening just too often…

On which port is SQL Server running?

Earlier this week, a good friend asked me the simple question: “How can I check on which port my SQL Server is currently running?” There are multiple ways, so let’s take a look at the ways you can find out the answer!

 
SQL Server Error Log
It’s a misconception that only error messages are logged in the SQL Server error log. There are also informational messages logged, which you can use to find the port on which SQL Server is currently running:

xp_readerrorlog 0, 1, N'Server is listening on'

 
The result:

 
SQL Server Configuration Manager
The SQL Server configuation manager is a tool which is installed alongside SQL Server. In this tool, you can change for example the TCP/IP settings of your SQL Server. It also shows you the current port on which SQL Server is running:

 
DMV
You can also query the system DMV’s:

SELECT DISTINCT 
    local_tcp_port
FROM sys.dm_exec_connections
WHERE local_tcp_port IS NOT NULL

or

SELECT local_tcp_port
FROM   sys.dm_exec_connections
WHERE  session_id = @@SPID

 
Command prompt
Finding the used port via a command prompt requires some more information. First, you need to find the process ID (PID) that SQL Server is using. The quickest way is to use the Task Manager in Windows. Once you have the PID, you can run the commmand below:

“netstat -ano | findstr [PID]”

The result:

 
Registry
The port number can also be found in the registry. Just remember, that the folder you see in the screenshot below contains my instance name (SQL2014DEV). Change this to your instance name!

 
You can also use the “master.dbo.xp_regread” stored procedures to read this registry key from T-SQL:

DECLARE @TCPPort NVARCHAR(5),
        @RegKeyName VARCHAR(8000);

SET @RegKeyName = CONCAT('Software\Microsoft\Microsoft SQL Server\',
                         @@SERVICENAME,
                         '\MSSQLServer\SuperSocketNetLib\TCP')

EXEC xp_regread
  @rootkey = 'HKEY_LOCAL_MACHINE',
  @key = @RegKeyName,
  @value_name = 'TcpPort',
  @value = @TCPPort OUTPUT;

SELECT @TCPPort;

 
Event Viewer
SQL Server also logs the port in the Windows Event Viewer. If you open the application log, and you filter on eventid 26022, you’ll see the port that is in use:

 
PowerShell
After I posted this blog, Johan Bijnens (@alzdba) sent me a message: I forgot the PowerShell option!

#tcpport.ps1
#Evaluates the SQL Server instances on a Windows server and returns the TCP port number used by each instance
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO")  | Out-Null
$m = New-Object ('Microsoft.SqlServer.Management.Smo.WMI.ManagedComputer') 'HOME'
$m.ServerInstances | ForEach-Object { $m.Name + '\' + $_.Name + ', ' +
       $m.ServerInstances[$_.Name].ServerProtocols['Tcp'].IPAddresses['IP1'].IPAddress.IPAddressToString + ':' +
       $m.ServerInstances[$_.Name].ServerProtocols['Tcp'].IPAddresses['IPAll'].IPAddressProperties['TcpDynamicPorts'].Value
       }

 
Thanks Johan for reminding me on this! The full code can be found on MSDN.

Redgate’s DLM Dashboard: The awesome continues

The artist formerly known as SQL Lighthouse, underwent a HUGE overhaul, and now goes by the name DLM Dashboard. This tool allows you to: “monitor your database schemas and alerts you to schema changes, including database drift”. And even though I find Redgate’s tools awesome, I’m still going to be brutally honest in this blog (and I know they appreciate that!).

In the second half of this blog post, there is also a Q&A with a Product Marketer of Redgate. In that Q&A, he answers a few open questions that will clarify the product and the future of this product.

 
Download and install
When you download and install the Redgate DLM Dashboard, you get a number of things. The most important things the installation folder (C:\Program Files (x86)\Red Gate\DLM Dashboard 1) contains are a webserver and a monitoring service. These are used to host the webinterface of the DLM Dashboard, and the service needed for monitoring your environments and databases. There are also 3 Windows services installed: “DLM Dashboard Monitoring Service”, “DLM Dashboard Storage Service” and “DLM Dashboard Web Server”.

 
Configuring your environment
After the installation, the DLM Dashboard is started:

 
When you click on “Set up DLM Dashboard”, you are redirected to the configuration page. First you need to add a project:

 
I created a Sandbox project, in which eventually all my environments will be configured. So my project will contain my 4 environments (DTAP). DTAP stands for Development, Test, Acceptance and Production. More on DTAP can be found here. In this test, I’m going to start with configuring a Development environment:

 
After adding that, you need to add a server:

I’m using the sa account to connect to this server, because it’s on a stand-alone server. If you start using the DLM Dashboard (or any other software), it’s bad practice to use the sa account. So if you want to use this in your environment, don’t use the sa account!!!

 
In the next screen you see, you get the question if you want to run the monitoring script yourself, or if you want the DLM Dashboard to run it:

 
When the script completes, you end up with a new database on the server that you configured in the dashboard. The database is named Redgate, and the name of the database can’t be changed at this point. For further information on this, check out the Q&A at the bottom of this post.

This database contains only 1 table named “DDL_Events”, to store all the changes that happened in your database(s).

The DLM Dashboard also creates a server DDL-trigger called “RG_SQLLighthouse_DDLTrigger” (which can be found under “Server Objects” -> “Triggers” in SSMS). This trigger registers all the changes on your database, and stores those changes in the Redgate database.

Now you need to combine all the configured items (Environment, Server and Database) in 1 last configuration:

 
I’ve executed these steps (configure an environment, add a server, and add a database) twice, to add a test environment as well. I’ve skipped the Acceptance and Production (which would complete the DTAP) in this test. So now my dashboard looks like this:

 
This is the default view when you just configured your databases. The schema of your database is basically a snapshot of your current database status.

 
Making changes, and marking versions
After you’ve configured the dashboard with all your environments (in my case 2 environment, just for testing purposes), it’s time to make some changes in your databases. Let’s add a table to the Sandbox database on development:

USE Sandbox
GO


CREATE TABLE dbo.TriggerSchemaChange
  (ID INT IDENTITY(1,1),
   Value VARCHAR(10))

 
If you wait for a few seconds (the dashboard will automatically refresh), you see that one of your databases is starting to “drift”:

 
If you click on “Review”, you get an overview of the changes that were made, and by who they were made:

 
In this case, I’ll tag this version as 1.0.0.1, and save that change. The dashboard changes, and the new version is shown. And if we execute the same change on the other database, the DLM Dashboard automatically recognizes the version:

 
You just need to acknowledge the “new” version on the test environment, just like on the development environment, and your dashboard looks all okay again:

 
The world ain’t all sunshine and rainbows…
Even though I think this is a wonderful way to keep an eye on your environment, there are still a few things that are missing in my opinion. I’ve asked Redgate these questions, and Andrew Denty (Blog | @andrewdenty), who is a Product Marketer at Redgate, was kind enough you answer these questions. This Q&A is published after getting permission from Andrew and Redgate. Thanks again Andrew!

 
1) What is the delay in tracking changes and refreshing the webpage, and can we change that?

The DDL trigger we install on the SQL Server puts entries into a table each time a database is changed.

Currently we poll the server every 13 seconds and query that table to see if there have been any changes.

This at the moment is not configurable, but should be as it is not a lot of work (and will probably be done before we leave beta). It’s also worth pointing out that you don’t need to hit refresh or anything – the dashboard live refreshes on its own.

 
2) Can we change the port number on which the webinterface runs (default 19528)?

Yes see http://documentation.red-gate.com/display/SL1/Changing+the+port+SQL+Lighthouse+uses

 
3) Can we change the database name (rename from RedGate to DBA for example), or log events to a dedicated database or server? And how does it react when the target server isn’t available?

At the moment no, the Redgate database is hard coded into the application, and is expected to be on the same server as the databases we are monitoring. The DDL trigger lives on the server so it always runs if the server is running. If DLM dashboard cannot reach the SQL server it will show you in the UI the server is not available. When it becomes available again DLM dashboard will check if there has been changes whilst it has been unavailable’.

 
4) Why do you use a DDL trigger, instead of Extended Events?

In SQL Lighthouse, because our DDL Trigger runs when a change happens, and crucially because it runs synchronously with the change, i.e. the change is temporarily blocked until we return. Therefore we not only get the information that we’re given, but we can also ask for other information, like the:

1. Accurate date it happened
2. The application that made the change
3. The IP address that made the change
4. The nest level, e.g. was this a change made directly, or did a change run a DDL Trigger + that DDL Trigger made this change
5. The transaction ID to help us group changes together

In contract, events run asynchronously, so by the time we see the message on the queue, the information above *may* still be queryable, but will probably not be available, and we can’t rely on it. The plus side of events is that they can’t break anything, because they run asynchronously, whereas if the DDL Trigger throws an error it’ll break what the user is doing.

In summary, we want the extra information so had to go with DDL Triggers.

 
5) The configuration limitations, are these just in the beta? The current configuration limitation is 5 database per environment

At the moment the beta isn’t designed to scale. We’re going to be working on improving the performance and scalability over the coming month. When we leave beta we’re planning to have a pro and the current free version. The pro version will allow you to monitor more instances and databases than the free version which will have some limitations. We’re still working out the exact details though.

 
6) Is there any SQLCompare integration planned? So for example, an overview of the differences between 2 environments (Development and Test), or syncing of 2 databases on different environments?

Yes, we’d like to add this to the Dashboard (assuming you have SQL Compare as well). We’d like to do this later this year– but we’re being agile and responding to feedback so we might change our plans and do something else first. Please do vote this up on UserVoiceas we’re using that to help us decide what to do next.

 
7) What is the plan on Octopus Deploy integration? We’re looking into that for automatic deploys, and it would be great to deploy a version from DLM Dashboard for example

Triggering deployments from the Dashboard isn’t something we have planned, though we do want to help be part of your release management process. I’d like us to be able to report back to Octopus if there has been drift away from the release it thinks is deployed (e.g .a hotfix has been made directly on production), so that you can get those changes back into process. Our SQL Releaseproduct (which is a database deployment plugin for Release Management tools like Octopus) may well use elements of the dashboard in the future.

 
 
Bugs
There are still a few bugs in the software:

– If you remove your project, the whole dashboard is empty, the database monitoring is still active, but you’re not asked if you want to disable/remove it completely
– If you remove your database, the database monitoring is still active, and Redgate database isn’t removed
– If you’re adding a column to a table, 2 columns are marked as changed (bugfix in progress)
– Cancel button that doesn’t work on the email notification screen (bugfix in progress)
– Can’t mark your initial version to a certain name (for example, tag it as version 1.0.0.0) (bugfix in progress)
– If your machine name is long, it’s trimmed in the review changes screen (bugfix in progress)
– Sometimes the webservice doesn’t start after a reboot of the machine, and you need to start it manually (bugfix in progress)

As mentioned, most of these bugs are already fixed, and ready for the next release. Redgate is really fast with picking up on bugs and fixing them. So if you find any bugs, please let them know!

 
Recap
Looking at the possibilities of the DLM Dashboard, and the version that is available now, I’m pretty confident this can become a great product! Redgate haven’t announced the differences between the free and paid versions yet, but there will always be a free version. Also, knowing Redgate, getting the paid version won’t be the biggest problem to overcome. Their licenses are pretty affordable, and the tools are indispensable if you ask me.

I think Redgate discovered a whole new market with this. They are jumping on the “release early, release often” train, and they do it quite well on the first try! I’ve never seen database professionals do this, in all the companies I’ve seen in the past. Only the .NET developers did that, and we all mocked them for it. But now, we’re starting to see a shift in the market, and Redgate is willing to help us on this. And for that, I want to thank the people of Redgate!

Follow

Get every new post delivered to your Inbox.

Join 67 other followers