Time traveling with SQL Server 2016: Temporal tables
July 13, 2015 2 Comments
Since the release of the first publicly available SQL Server 2016 CTP2, we have a whole list of new and cool features we can start using. One of those features is called temporal tables (or system-versioned tables). So what are temporal tables, and how can you use them?
What is a temporal table?
To understand the functionality of temporal tables, we first need to agree on the definition of a table in SQL Server. If you run a query on a table that contains data, that query can only return the current version of a record. This means you can only see the current “truth”, and there’s no way to travel back in time to see older versions of this record.
Temporal tables give you the opportunity to time travel in SQL Server. For every data change (Update, Delete and Merge) in your table, a historical copy is stored in a history table. The table on which you enabled this data tracking is changed to a “System-Versioned” table.
How does it work?
For every update or delete on a table where system_versioning is enabled, a historical copy of the record is stored:
But how does that work? Let’s start by creating a table we will use to test system-versioned tables:
CREATE TABLE dbo.BankAccount (AccountNumber INT CONSTRAINT PK_BankAccount_TransactionID PRIMARY KEY, AccountBalance FLOAT);
In order to make a table a system-versioned table, we need to add 2 datetime2 columns to our newly created table. SQL Server will use these columns to store a from- and to-date that will indicate the valid period of time of the record. By telling SQL Server to use these 2 columns with the keywords “PERIOD FOR SYSTEM_TIME”, SQL Server knows that it needs to use these columns when querying data from the history table:
ALTER TABLE dbo.BankAccount ADD ValidFrom datetime2 GENERATED ALWAYS AS ROW START HIDDEN NOT NULL, ValidTo datetime2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL, PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo);
In the above script, the “HIDDEN” keyword is used. This can only be used from CTP 2.1 and higher. What HIDDEN does, is hide the columns in your original table to make sure application- and database code isn’t going to fail because you added these 2 new columns. You can still query the columns, but if you run a “SELECT *” query these columns won’t be returned.
When that’s done we can enable system-versioning on the table:
ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory));
Data changes & history
Now we can take a look at how data changes are captured, and history is recorded. Before you execute any of the statements below, enable the execution plan in your SSMS. If we insert a new record, no history is created:
INSERT INTO dbo.BankAccount (AccountNumber, AccountBalance) VALUES (2147483647, 10000)
We just see the expected Clustered Index Insert, nothing special:
Now, let’s update the bank account balance (the WAITFOR is added just to make sure we have a clear difference in change time for the next few steps):
UPDATE dbo.BankAccount SET AccountBalance = AccountBalance - 2500 WAITFOR DELAY '00:01:00' UPDATE dbo.BankAccount SET AccountBalance = AccountBalance + 1000
If you look at your execution plan, you’ll see an extra DML command in the execution plan:
Besides the expected Clustered Index Update, there is an extra Clustered Index Insert. This is the insert statement that is generated by SQL Server, to insert a record in the history table. There’s no user interaction needed to store historical data. SQL Server will take care of that process for you.
Now that we store historical data, it’s time to start the actual time travelling. To paint a complete picture, this is a timeline of the data inserts and updates:
Now that we have a clear timeline, let’s start our time travel. Besides creating and maintaining the history table, you also get to use a bit of new syntax in your query. For system-versioned tables, you can use the “FOR SYSTEM_TIME” clause in your query. In order to show you the timeline in SQL Server, let’s use this new syntax to query our history per minute:
SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:42:00' --No data available SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:43:00' --First insert SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:44:00' --First update SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:45:00' --Second update SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF '2015-07-09 09:46:00' --No other changes
Remember, this is based on my data. So if you want to query your timeline, change the datetime string based on your data!
So basically, you can start querying your data by using a specific point in time:
DECLARE @PointInHistory DATETIME2 SET @PointInHistory = '2015-07-09 09:45:00' SELECT * FROM dbo.BankAccount FOR SYSTEM_TIME AS OF @PointInHistory
Besides the actual point in time, you can use 3 other ways to query your data:
SELECT AccountNumber, AccountBalance, ValidFrom, ValidTo FROM dbo.BankAccount FOR SYSTEM_TIME FROM '2015-07-09 09:43:00' TO '2015-07-09 09:45:00' SELECT AccountNumber, AccountBalance, ValidFrom, ValidTo FROM dbo.BankAccount FOR SYSTEM_TIME BETWEEN '2015-07-09 09:43:00' AND '2015-07-09 09:45:00' SELECT AccountNumber, AccountBalance, ValidFrom, ValidTo FROM dbo.BankAccount FOR SYSTEM_TIME CONTAINED IN ('2015-07-09 09:43:00', '2015-07-09 09:45:00')
In practice you’ll not only see data changes happening on tables, but you’ll also have schema changes on your table once in a while. But how does a temporal table cope with schema changes? Let’s try to add a column to our table:
ALTER TABLE dbo.BankAccount ADD NewColumn VARCHAR(10)
This results in an error:
In order to add a column, we need to disable the system-versioning first, before we add the new column:
ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = OFF) GO ALTER TABLE dbo.BankAccount ADD NewColumn VARCHAR(10) GO
Now let’s enable system-versioning again:
ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory)) GO
This results in another error:
So we first need to add the same column to the history table, before enabling system-versioning again:
ALTER TABLE dbo.MyBankAccountHistory ADD NewColumn VARCHAR(10) GO ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory)) GO
Disabling system-versioning means that for a short period of time, you could loose historical data. A possible way to get around this, is by setting your database in single user mode before you disable it. This prevents other user settings from changing data in that table. If you run this in your production environment, this might not be your best option though! By disabling system-versioning, historical data will not be removed.
Indexing historical data
Performance isn’t only important when you query your “normal” tables, but also if you want to query your newly created temporal tables. If you want to write a business process that uses your historical data, you need to make sure this doesn’t slow down your application. But how do you achieve that?
Just like on a normal table, you can create indexes no the history table. This table is basically a normal table, so there’s no reason why you can’t create indexes on it:
CREATE NONCLUSTERED INDEX IDX_MyHistoricalData ON dbo.MyBankAccountHistory (AccountNumber, AccountBalance)
Historical data cleanup
It’s really great to store historical versions of your records in a separate table, but this also provides you with new challenges. You need more storage for your data, and you might want to remove data after a certain period of time. At this point in time, system-versioned tables don’t have a cleanup process. This might change in the future, but right now you need to cleanup these tables yourself. So if you want (or need) to cleanup old data, you need to write a small process yourself. This can be as easy as writing a query to run through all system-versioned tabled and delete old data, and schedule that in a SQL Server Agent job. But again, this is something to keep in mind when you start working with temporal tables.
I had a really good conversation about this with Borko Novakovic (@borko_novakovic), who is the Program Manager for this part of SQL Server. The code below is an example of such a process, that Borko provided:
BEGIN TRAN /*Drop versioning to allow deletion of historical data*/ ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = OFF) /*DELETE MyBankAccountHistory table*/ DELETE FROM dbo.MyBankAccountHistory WHERE ValidTo < '2015.01.01' /*Re-establish versioning*/ ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.MyBankAccountHistory, DATA_CONSISTENCY_CHECK = OFF)); COMMIT
Deleting data without disabling system-versioning doesn’t work, because you’ll get an error like this:
I do want to thank Borko again for his time. He helped me out a lot, and I’m thankful he took the time to do that. Thanks Borko!!!
Removing system-versioned tables from your systems takes one extra step, because just executing a “DROP TABLE” statement doesn’t work. SQL Server will prevent you from dropping temporal tables, without you first disabling it. Let’s start by disabling system-versioning:
ALTER TABLE dbo.BankAccount SET (SYSTEM_VERSIONING = OFF) GO
This disables the system-versioning on your original table, and transforms the history table to a normal table. If you want to get rid of the tables, you can just drop them both like you would normally do:
DROP TABLE dbo.BankAccount GO DROP TABLE dbo.MyBankAccountHistory GO
One thing you definitely MUST SEE is a video with Borko and Scott Klein (Blog | @SQLScott) on Channel 9. In this video, Borko explains the idea behind temporal tables, and shares some insights on this new feature.
Other information on temporal tables can be found on MSDN: