Azure Data Lake – Register JSON Assemblies

The power of Azure Data Lake is that you can use a variety of different file types to process data (from Azure Data Lake Analytics). But in order to use JSON, you need to register some assemblies first.

Downloading assemblies
The assemblies are available on Github for download. Unfortunately you need to download the solution, and compile it on your machine. So I’ve also made the 2 DLL’s you need available via direct download:

Microsoft.Analytics.Samples.Formats.dll
Newtonsoft.Json.dll

 
Upload to ADL
Before we register the assemblies, we need to upload the files to Azure Data Lake storage. In my case, I created a folder called “Assemblies”, and in that folder a directory called “JSON”:

 
Now upload the 2 dll’s that you downloaded into that folder.

 
Register the assemblies
I’m running the register USQL job from Visual Studio, but you can also do this from the Azure portal, by running a USQL job in the Azure Data Lake Analytics window.

By running the statements below, you register both dll’s in your Azure Data Lake Analytics and you can start using JSON:

CREATE ASSEMBLY [Newtonsoft.Json] FROM "Assemblies/JSON/Newtonsoft.Json.dll";
CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM "Assemblies/JSON/Microsoft.Analytics.Samples.Formats.dll";

 
Conclusion
Because we use JSON as a primary way of creating, sending and storing data, being able to use this filetype in ADL is a must. This saves us time, because otherwise we would need to change the file to an intermediate type (like CSV or text) before we could process this data.

Hopefully this short tutorial helps you out as well.

Advertisements

Copy data from Azure Data Lake to another Data Lake with AdlCopy

The last months it’s been a bit quiet on my blog. I started working on some new stuff, and couldn’t find the inspiration when it came to finding new subjects to blog about. I started working with Azure Data Lake a few months back, and I decided to share my (limited) knowledge here again, hoping it saves you time somewhere down the line.

Migrating data from one Data Lake to the other
We started out with a test version of a Data Lake, and this week I needed to migrate data to the production version of our Data Lake. After a lot of trial and error I couldn’t find a good way to migrate data. In the end I found a tool called AdlCopy. This is a command-line tool that copies files for you. Let me show you how easy it is.

 
Download & Install
AdlCopy needs to be installed on your machine. You can find the download here. By default the tool will install the files in “C:\Users\\Documents\AdlCopy\”, but this can be changed in the setup wizard.

Once you installed the tool, you can open a command prompt to use the tool:

 
Now you need to find the file or directory you want to copy. You can do this by opening the file location in the Azure portal, and click on “Folder properties”:

 
This URL will be the input for AdlCopy:

 
You should also find the destination URL for the other data lake, since this will be the target.

 
Linking it to your Azure subscription
With AdlCopy it’s not needed to link anything directly to your subscription, or configure anything. The first time you run a copy-command, a login box will pop up. If you login with the account you use to login to the Azure portal, the tool will be able to access your resources.

 
Copying data
The input for AdlCopy are “/Source” and “/Dest”. These represent the source data and the destination to copy the data to.

There are 2 options when you want to copy files: single file or entire directory:

Copy a single file:

AdlCopy /Source adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/<FILENAME>.json /Dest adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/<FILENAME>.json

 
Copy an entire dirctory:

AdlCopy /Source adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/ /Dest adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/

 
When you want to copy an entire directory, make sure you add the trailing “/” (slash) to the path. If you don’t do that, the copy will fail (you can’t copy a directory into a file).

 
Conclusion
After trying out some stuff with Data Factory, manually copying files and considering building a small C# tool, this was the quickest option. It works out of the box, and you don’t have to be a rocket scientist to get this to work. So: The perfect tool for the job!

Cleanup sysobjects after disabling replication

In my previous blog post I explained how we are going to leverage the power of transactional replication, to move your on-premise databases to Azure SQL databases (PaaS). This works like a charm, but you do end up with a bunch of system-generated objects in your database. So how do you get rid of these?

 
What “mess” is left behind after disabling replication
After you’ve completed your migration to Azure SQL DB, and cut the strings on your transactional replication, you end up with a bunch of system-generated objects. These are used by SQL Server to “replay” the changes on the subscriber (inserts, updates and deletes) that were made on the publisher. This is just an example of one of our smaller databases:

 
As you can see, there are 3 stored procedures per table (ins, upd, del), and some other objects used by replication.

Maybe it’s my IT-OCD talking, but it feels nice to clean this up, isn’t it?!

 
Cleanup
Fortunately the SQL Server team gave us a way to easily clean up these objects: sp_subscription_cleanup (that, even though MS Docs says it won’t, WILL work on Azure SQL DB).

This stored procedure accepts the following input:

@publisher: The name of the SQL Server instance of the publisher
@publisher_db: The name of the publishing database
@publication: The name of the publication (this was configured in the setup/configuration of replication)

By running this stored procedure on your subscriber, it cleans up all remaining transactional replication components. But where can you find the input for this procedure?

First I was looking at MSreplication_objects, but this doesn’t work on Azure SQL db (both MS DOCS says so, and I’ve tested it). So my guess was that all the other systemobject mentioned in related articles wouldn’t work as well.

So what do you do when you can’t find the answer yourself after extensive Googling/Bing-ing? Yes, you call in reinforcements (in this case William Durkin (Website | @sql_williamd), who knows a lot about replication):

 
But as you can tell by the reply on William’s response, I quickly found the answer by just trying something out: MSreplication_subscriptions. You can query this table on your subscriber to retrieve the publisher- and publication-properties. Again: even though MS DOCS says it won’t work, it does work on Azure SQL db.

So thanks to some “Rubber Ducking” I found the solution:

 
Thanks again for your mental support William!

So by using the output of MSreplication_subscriptions on your subscriber, you can quickly find the information needed to run sp_subscription_cleanup.

 
Automating the script with C#
Running the script on 1 or 2 databases can be done manually. But because I migrated an entire data platform with replication, we’re talking about A LOT of databases. So instead of doing this all manually (you need to connect to every single database to check this), I decided to create a small tool to help me with this.

In all honesty, my C#-skills aren’t that great, but my response to that is: “If it looks stupid but works it ain’t stupid”.

The tool connects to the instance you provided, returns a list of databasenames, and uses this list to connect to every database individually to check for replication components:

 
The tool can be used to connect to any Azure SQL db instance, as long as you have 1 login/user that can connect to all database on that server. If your server supports that (it depends on the configuration you use of course), you’ll see a list that provides you with the following information:

– Databasename
– Publisher
– Publisher_db
– Publication
– Distribution_agent
– CleanupScript

The last column is result of the following statement:

SELECT
	DB_NAME() AS Databasename,
	Publisher,
	Publisher_db,
	Publication,
	Distribution_agent,
	'EXEC sp_subscription_cleanup @publisher = ''' + Publisher + ''', @publisher_db = ''' + Publisher_db + ''', @publication = ''' + Publication + ''';' AS CleanupScript
FROM MSreplication_subscriptions

 
You can also use this script without downloading the tool.

But be careful, these could still be in use, so check that before you start cleaning up!

You can download the tool here (be aware, this is a .exe file!), or download the Source code here, and contact me if you would like me to share it via GitHub or something like that.

Send email on Azure Automation runbook failure

One of the downsides of using Azure Automation runbooks is the visibility of the runbook-outcomes. That’s one of the few downsides I could actually find when I wrote my “Running database maintenance on Azure SQL DB with Azure Automation” post. But because we execute the runbooks against our production environment, visibility is of high importance to us.

To get some more insights into this, I’ve implemented a try-catch block in every runbook we have, and send an email to our ITOps team when a runbook fails. So how did we do this?

 
Create an email runbook
To keep it a bit modular (and I know it could be set up even better), I’ve made a runbook that is only used to send an email to our IT Operations team. This email includes the runbook name and the error that occurred.

Because we are going to include sensitive account information in a Azure Automation runbook, it’s a wise idea to store the username and password of your Office365 account into a “Credential” in Azure Automation. This safely stores your account information, and allows you to use that in your runbook later. This option can be found in the Automation Account’s left menu.

Now that we’ve done that, let’s create a runbook that sends an email:

workflow EmailError
{
  Param
  (
    # Input parameters
    [Parameter (Mandatory = $true)]
    [string] $RunbookName,

    [Parameter (Mandatory = $true)]
    [string] $MessageBody
  )

    # RetrieveOffice 365 credential from Azure Automation Credentials
    $O365Credential = Get-AutomationPSCredential -Name "CredentialName"
    
    InlineScript
    {
        # Create new MailMessage
        $Message = New-Object System.Net.Mail.MailMessage
        
        # Set address-properties
        $Message.From = "no-reply@domain.com"
        $Message.replyTo = "no-reply@domain.com"
        $Message.To.Add("ITOPS@domain.com, support@domain.com")
  
        # Set email subject
        $Message.SubjectEncoding = ([System.Text.Encoding]::UTF8)
        $Message.Subject = "Failed job: $Using:RunbookName"
        
        # Set email body
        $Message.Body = "Error message: <br /><br /><br /><br /><br /> $Using:MessageBody"
        $Message.BodyEncoding = ([System.Text.Encoding]::UTF8)
        $Message.IsBodyHtml = $true
        
        # Create and set SMTP
        $SmtpClient = New-Object System.Net.Mail.SmtpClient 'smtp.office365.com', 587
        $SmtpClient.Credentials = $Using:O365Credential
        $SmtpClient.EnableSsl   = $true
  
        # Send email message
        $SmtpClient.Send($Message)
    }
}

 
This runbook sends an email from “no-reply@domain.com” to “ITOPS@domain.com”, and includes the input parameters (RunbookName and MessageBody) in the subject and body of the email. You can customize this based on your preferences.

 
Call the runbook
The way we use this runbook is by calling this from another runbook that runs our database maintenance. This runbook is copied from this post, except for the try-catch block.

By adding the $ErrorActionPreference and try-catch block, it’s possible to receive an email if a runbook fails:

workflow DBM_Servername_Databasename
{
    try
    {
        # Set error action preference
        $ErrorActionPreference = 'Stop'

        inlinescript
        {
                # Output status to console
                Write-Output "Creating connection..."

                # Create connection to database
                $DatabaseConnection = New-Object System.Data.SqlClient.SqlConnection
                $DatabaseConnection.ConnectionString = "Data Source=Servername.database.windows.net;Initial Catalog=DatabaseName;Integrated Security=False;User ID=UserName;Password=Password"
                $DatabaseConnection.Open()

                # Output status to console
                Write-Output "Creating command..."

                # Create command to execute
                $DatabaseCommand = New-Object System.Data.SqlClient.SqlCommand
                $DatabaseCommand.Connection = $DatabaseConnection
                $DatabaseCommand.CommandText = "EXEC dba.ExecuteMaintenance"
                $DatabaseCommand.CommandTimeout = 0;

                # Output status to console
                Write-Output "Executing payload..."

                # Execute the query
                $DatabaseCommand.ExecuteNonQuery()

                # Output status to console
                Write-Output "Closing connection..."

                # Close connection to database
                $DatabaseConnection.Close()
        }
    }
    catch
    {
        # Output status to console
        Write-Output "Catchie catchie..."
    
        # Set $errorMessage
        $errorMessage = $_
        
        # Call EmailError-runbook in order to send a failure-email
        EmailError `
		-RunbookName "DBM_Servername_Databasename" `
        -MessageBody $errorMessage

        # Output status to console
        Write-Output "Mail sent..."
    }
}

 
I’ve included a bunch of “Write-Output” lines in there, so that if I test it through the Azure portal I can see the status of the runbook. You can remove those if you want to have a clean runbook of course.

 
Conclusion
By using PowerShell to automate this type of work its easy to extend your jobs like we did. Adding an email component is just as easy as altering the payload in your runbook. This extensibility and modular approach is the way to go, if you want to deliver a solution that is easy to maintain and scalable.

And let me tell you something, that PowerShell is going to be big someday, mark my words! 😉

Schedule and run Azure Automation runbooks via Webhook

In one of my previous posts I wrote about Azure Automation jobs to run your Azure SQL DB maintenance processes. This is a very easy way to schedule your maintenance, but there are some shortcomings when you look at the scheduler options and job outcome visibility.

In this blog post I’ll show you how you can schedule your runbooks through webhooks. Then you’re not bound to the minimal schedule of once per hour, that is a limit set for Azure Automation schedules.

 
Start with a runbook and webhook
In this post I’m using the runbook that I previously created for running database maintenance. Before I created a webhook, I removed the schedule on the runbook itself. This is to prevent confusion on which schedule started the runbook.

Now that the basics are set, click on the “Webhooks” button, and click “Add Webhook” in the blade that opened:

 
A new blade opens up in the portal, and this blade allows you to create a new webhook. Give the webhook a name and an expiration date (which is mandatory, even though you don’t want it to expire), and copy the URL. You need to store this in a secure place, because it will only be visible in this windows while creating the webhook. It can’t be found anywhere else if you loose this URL:

 
The webhook URL will look like this (this specific URL is changed, and won’t work): “https:// s2events.azure-automation.net/webhooks?token=eT0%0bKvfBcMeusbZw00RAwP0ZKGqxEUnThBIHAEB0Eq0%0d”

 
The webhook is ready to use, so now we can move on to creating a schedule.

 
Create a webhook scheduler
Just by creating a webhook alone the job will never run. In order to do that, we need to create a “Scheduler Job Collection”. In the Azure portal you need to go to “Scheduler Job Collection”, click on “Add”, and give your Scheduler Job Collection a name. If you already have a Job collection you can add this one to it, or create a new one. Now go to “Action settings” to configure the webhook URL for this scheduler:

 
Make sure you set the action to “HTTPS”, and the method to “Post”.

 
Under “Schedule” you can configure the actual schedule. As mentioned earlier, the schedule of a Azure Automation runbook can only be configured in hours. The schedule of a Job Scheduler can be configured in minutes, so this allows you to run a job multiple times an hour in case you need it:

 
Conclusion
In some cases, it might be needed to run an Azure Automation runbook multiple times an hour, but that’s (currently) not supported for runbooks. By using the webhook and job scheduler as a work-around you’re able to make the runbooks just a bit more useful. Before using the job scheduler, we were required to duplicate runbooks and configure the schedule to run on specific times, but this only made things more unclear. This works out for us, and hopefully for you too!

Easy data encryption in Azure

This article was recently also published on dev.getroadmap.com:

 
 
 
 

For those of you who use Azure today, the security discussion must have been a thing on some occasion. Explaining to managers (and possibly colleagues) that Azure is a lot more secure than a(n) (on-premise) data center, and that Azure is easier to maintain and scalable. Trust me, we’ve all been there!

But besides the physical security, there’s also the digital security. In the world of today it’s easier to find a data-breach on the news, then it is to find an item about a bank robbery. So how can you secure your data in Azure in an easy but solid way, without the hassle of changing your applications?

Encryption could be one of your tools to achieve a secure infrastructure and/or applications. But encryption is a challenge for pretty much everyone. Almost every day we hear about companies not doing it right, or not doing it at all. But luckily, Azure helps us with setting this up with just the click of a button.

Okay, okay, you got me. Maybe a few button clicks…

 
Databases
For your Azure SQL databases, there’s a feature called “Transparent Data Encryption”, or TDE for short. This encrypts your data at rest with “FIPS 140-2 validated 256 bit AES encryption”. Or, in normal words: you encrypt your data with an AES-256 encryption key.

So how do you enable it? There are 2 ways to do so, but I’ll only show you the route via the Azure portal. Information on how to do this via T-SQL can be found here.

First, login to the Azure portal, and navigate to the database you want to encrypt. Click on “Transparent Data Encryption”, and just with a click of a button you can encrypt your data:

 
This will start the encryption process and, depending on the size of the database, after a while you’ll see that the data is encrypted:

 
This feature will allow you to encrypt your database, without any application changes. This is because the encryption and decryption is being handled in an “intermediate layer” by Azure. The data will be decrypted before returning it to the client, and the other way around it will be encrypted before it’s stored. So your applications will continue to work without any changes in the application-code or connectionstring(s) to the database(s).

 
Storage Accounts encryption
There is also an option to encrypt your Storage Accounts in the same way as TDE works for Azure SQL databases (without any application changes). When you enable this on your Storage Account, please remember that only the new data will be encrypted, and that the existing data won’t be encrypted until it changes. For more information on this, please read this article, and this MSDN thread.

When you’re creating a new Storage Account, you can choose to encrypt it right away:

 
But when you want to encrypt an existing Storage Account with data in it, you need to do it on 2 different levels (it’s a separate setting for BLOB and files):

 
This will encrypt your data with the same algrorithm as TDE for SQL Server will do: “All data is encrypted using 256-bit AES encryption, one of the strongest block ciphers available.” (source).

 
Conclusion
For us as a company, enabling this features means that all of our data is encrypted. We’re only sending and receiving data from within Azure, so the communication is also secure. And even though the majority of our data is public data (publicly available such as flight information, etc.), it’s a safe feeling to know that all our data is encrypted when stored.