Azure Data Lake – Register JSON Assemblies

The power of Azure Data Lake is that you can use a variety of different file types to process data (from Azure Data Lake Analytics). But in order to use JSON, you need to register some assemblies first.

Downloading assemblies
The assemblies are available on Github for download. Unfortunately you need to download the solution, and compile it on your machine. So I’ve also made the 2 DLL’s you need available via direct download:

Microsoft.Analytics.Samples.Formats.dll
Newtonsoft.Json.dll

 
Upload to ADL
Before we register the assemblies, we need to upload the files to Azure Data Lake storage. In my case, I created a folder called “Assemblies”, and in that folder a directory called “JSON”:

 
Now upload the 2 dll’s that you downloaded into that folder.

 
Register the assemblies
I’m running the register USQL job from Visual Studio, but you can also do this from the Azure portal, by running a USQL job in the Azure Data Lake Analytics window.

By running the statements below, you register both dll’s in your Azure Data Lake Analytics and you can start using JSON:

CREATE ASSEMBLY [Newtonsoft.Json] FROM "Assemblies/JSON/Newtonsoft.Json.dll";
CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM "Assemblies/JSON/Microsoft.Analytics.Samples.Formats.dll";

 
Conclusion
Because we use JSON as a primary way of creating, sending and storing data, being able to use this filetype in ADL is a must. This saves us time, because otherwise we would need to change the file to an intermediate type (like CSV or text) before we could process this data.

Hopefully this short tutorial helps you out as well.

Advertisements

Copy data from Azure Data Lake to another Data Lake with AdlCopy

The last months it’s been a bit quiet on my blog. I started working on some new stuff, and couldn’t find the inspiration when it came to finding new subjects to blog about. I started working with Azure Data Lake a few months back, and I decided to share my (limited) knowledge here again, hoping it saves you time somewhere down the line.

Migrating data from one Data Lake to the other
We started out with a test version of a Data Lake, and this week I needed to migrate data to the production version of our Data Lake. After a lot of trial and error I couldn’t find a good way to migrate data. In the end I found a tool called AdlCopy. This is a command-line tool that copies files for you. Let me show you how easy it is.

 
Download & Install
AdlCopy needs to be installed on your machine. You can find the download here. By default the tool will install the files in “C:\Users\\Documents\AdlCopy\”, but this can be changed in the setup wizard.

Once you installed the tool, you can open a command prompt to use the tool:

 
Now you need to find the file or directory you want to copy. You can do this by opening the file location in the Azure portal, and click on “Folder properties”:

 
This URL will be the input for AdlCopy:

 
You should also find the destination URL for the other data lake, since this will be the target.

 
Linking it to your Azure subscription
With AdlCopy it’s not needed to link anything directly to your subscription, or configure anything. The first time you run a copy-command, a login box will pop up. If you login with the account you use to login to the Azure portal, the tool will be able to access your resources.

 
Copying data
The input for AdlCopy are “/Source” and “/Dest”. These represent the source data and the destination to copy the data to.

There are 2 options when you want to copy files: single file or entire directory:

Copy a single file:

AdlCopy /Source adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/<FILENAME>.json /Dest adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/<FILENAME>.json

 
Copy an entire dirctory:

AdlCopy /Source adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/ /Dest adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/

 
When you want to copy an entire directory, make sure you add the trailing “/” (slash) to the path. If you don’t do that, the copy will fail (you can’t copy a directory into a file).

 
Conclusion
After trying out some stuff with Data Factory, manually copying files and considering building a small C# tool, this was the quickest option. It works out of the box, and you don’t have to be a rocket scientist to get this to work. So: The perfect tool for the job!

Creating an online Power BI Dashboard, using an Excel file hosted on SharePoint as source

Power BI gives you a great advantage, for example when you use it to get quick insights into your data. That’s one of the reasons our business analyst fell in love with this tool pretty quickly after she started using it. It almost completely replaced Tableau and Excel-based reporting in her daily routine.

Another great feature of Power BI is that there are so many data sources and connectors (adapters) to get data from all kinds of sources. But some of these connectors almost made me pull my hair out yesterday, so I’d like to share this learning with you, and hopefully it will save you some time and frustration 🙂

Excel as source
The dashboard I wanted to created would show the technical debt we have within our platform. For now the is just a test dataset, but in the future this will be used to measure technical debt over time, to get more insights into that.

So the first step is to create an Excel file, and prepare that for use in Power BI. You start out by creating an Excel file with some data in it:



But to allow Power BI to read this file, you need to use the “Format as table” feature in Excel. You do this by selecting your dataset, and click on the button “Format as table”, and select a style:



When you’ve done that, your sheet will look like this (depending on the style you picked of course):



The file is now ready to be stored on SharePoint. In my case, I’ve created a directory called “Technical Debt Dashboard” in the root of our shared documents:



Adding datasource to Power BI Portal
If you go to the Power BI Portal (app.powerbi.com) and log in, you can click on the button “Get Data”, and click on the button to attach your Excel file:



The next screen shows you the sources that you can use to host your file on. Of course, we choose SharePoint:



In the next popup window, you can enter the URL of your SharePoint:



If you paste in the base-URL (like I did), you can drill down in the next window you’ll see. Also, it will prompt for credentials to login to SharePoint if needed.

When you selected the file you want to use, click on connect:



The next question you get is how you want to import the data, or just connect the datasource. In this case, we want to click on “Import”:



Now the Power BI portal will process the contents of the Excel file, and a dataset will be created:



Okay, you can relax now, because the annoying work is over. We will now move into getting the data in your Power BI, so: the fun stuff!



Using the dataset in Power BI
In Power BI Desktop you can add the data by clicking on “Get Data”, and pick the option “Power BI Service”:



In this window you will see all of the available datasets (in my case of my own workspace). Click on the datasource you want to use (in this case “TechnicalDebt”):



The data is now imported and ready for use in Power BI Desktop:



For this example, I’ve added a simple gauge and added the value “Backend Technical Debt” as a value:



Now that my dashboard is done, I can upload it to the online portal by pressing the “Publish” button. But this probably doesn’t come as a surprise to you.



Refreshing the data
One if the pitfalls I encountered during my quest to get this to work, was the refresh time. On several occasions the data wasn’t refreshed, even if I tried to do that manually in the portal. But after some digging a colleague of me DID actually RTFM.

The Microsoft docs state the following:

“Because both Power BI and OneDrive are in the cloud, Power BI connects to your workbook file on OneDrive about every hour. If any changes are found, your dataset, reports, and dashboards are automatically updated in Power BI.”

In my first test, it took more than 30 minutes, so I assumed it didn’t work. In later tests it took less than 10 minutes for Power BI to refresh the data. So don’t panic when you don’t see your changes like I had. More info on that can be found in the Microsoft Docs article Get data from Excel workbook files.



Conclusion
TLDR; Maybe I should read the (f******) manual more often 🙂

Some of the Power BI Desktop functions could be more clear, for example the URL box. You can provide a URL to your SharePoint, but I couldn’t point it to a directory directly. Maybe it’s because the “Shared Documents” is not a directory but a site within a site. But after a few minutes I used the SharePoint base-URL, and that was a perfectly fine work-around.

Another issue I would like to raise (especially to you Microsoft, in case you ever read this): PLEASE GIVE US A POWER BI DESKTOP FOR MAC!!!

One of the reasons I used the shared Excel file as a source, was to work around the whole issue that there’s no version of Power BI Desktop available for my colleagues that work on a Macbook. And I think there are a lot of other users out there who would really like to get their hands on Power BI Desktop for Mac, not just our company…

And thanks again Koen Verbeeck (Blog | @Ko_Ver), for the mental support and for your ideas that pointed me in the right direction! 🙂

Year number 2, working for a start-up

Just like last year, I wanted to write a blog post about how the past 12 months went. But this year all of a sudden everything changed for me. But let’s start at the beginning…

 
Beginning of the year
The beginning of 2017 was pretty relaxed. After the really busy year I had (2016), I decided for myself that I needed to turn it down a notch. The immense amount of hours I spend in the office and working from home on the evenings and weekends, the stress and workload, it all became too much in December of 2016. For 2017 I was planning to work a bit smarter instead of harder, and that should give me the opportunity to take a step back and see the rainbow through the clouds.

In the first few months this actually worked for me. It was quite surprising to be honest, but my stress started to decrease, and I could focus on the stuff that needed my attention. For example, just a few of the things I did in Q1 of 2017: Worked on our ISO 27001 certification process with a team of people, had some security calls with HUGE corporates (which were really nice), made preparations for knowledge sessions, etc.

There are a few things that do deserve an honorary mention. One of the things that started last year was a quarterly hacklab with Deloitte. In this hacklab they tested our digital- and physical-security with different approaches. Just a few things they tried are: pen-tests (penetration tests of our software and infrastructure), phishing, social engineering, media-baiting, etc. This was really fun to make this happen together with them.

Another big project that started was our migration from a hybrid platform to Azure. This started in February, and this was a really cool project to discover new things in Azure, and a challenge to do this with the least possible amount of downtime.

We also started with knowledge sessions within the company. Together with a colleague (Pascal) we hosted the first session in March.

 
Q2: The dark times began
The second quarter of the year started out really good. Together with Pascal (not only my colleague, but also my best friend and parter-in-crime in most cases), his wife Aga and their beautiful baby girl Emily we went to Poland for a holiday, and the baptism of the little princess. Even though some things didn’t work out the way we planned, I did enjoy my stay there. Great people, great food, beautiful country. Oh yeah, and some drinks here and there…

But without knowing, my life would change all of a sudden right after I came back from holiday. Before my flight left Warschau I called home. My parents asked me to stop by on the way home, because they had something they wanted to tell me. This conversation over the phone felt somewhat uncomfortable, and I was soon to discover why. So as soon as I landed at Amsterdam Schiphol, I drove to my parents house.

The message wasn’t sugarcoated in any way. They told me my mom received the results of some tests from the hospital: terminal lung cancer. Even when I write this now, I immediately feel that sinking feeling again like I felt on the 19th of April. Just like it was yesterday.

In the weeks that followed she needed to go to the hospital over and over again. And every talk we had with doctors changed the situation. It went from inoperable to “we can slow it down with chemo en medicine”, to “we think it’s curable”, to terminal. The final diagnosis was somewhere between 8 weeks and 8 months to live. I was in the room when the doctor said that. And even though my world collapsed, I didn’t show that. I remained strong, for her.

I’ll skip some months, because it’s hard to talk about all the things I’ve seen and experienced. All I can say is that it left its marks on my soul forever.

Eventually she went to a hospice in the village we live in (which was really nice, since I could spend all my time there without losing any time on traveling somewhere). The reason for that was that she needed more care than we could give her at home, mainly because the care consisted of medical treatment and giving her shots for pain medication. But looking back now, it was only for a short period of time, not even a week…

Just 2 months and 20 days (or 80 days in total) after the first diagnosis, she passed away. That’s when the darkness began for me. It’s hard for me to describe how this feels, but I know the lyrics of a song come fairly close to that feeling:

 

There’s an emptiness tonight,
A hole that wasn’t there before.

And I keep reaching for the light,
But I can’t find it anymore.

There’s an emptiness tonight,
A heavy hand that pulls me down.

They say it’s gonna be alright,
But can’t begin to tell me how.

‘Cause I’m just sitting in the dark,
In disbelief that this is real.

 
It’s hard to express the feelings to someone who hasn’t experienced it themselves. That’s what I found out by talking to friends who experienced similar situations. Even after 4 months and 26 days, I still can’t find the words to describe it.

Another description I can relate to is: it’s “survival mode”. Nothing matters anymore, just the care for my mom mattered. And even now, it’s hard to see the true value of things, because sometimes I’m still on auto-pilot I think. It’s so much easier to see the negative side of things, when you just feel pain. I didn’t see the things that happened around me. All I could feel is emptiness, anger, feeling lost, feeling left alone, frustration, powerlessness, and I can go on. My heart was broken (and still is).

 
There were fun times though
But even though this was happening at home, I did have nice moments as well. In May I traveled with 2 friends to Seattle (my first time to the US!) and attended BUILD 2017. This was a really nice event, we saw a lot of great (new) things, and talked to a lot of interesting and smart people.

After BUILD, we stayed for a couple of days because we had some appointments at the Microsoft HQ in Redmond. Most of the appointments were arranged by Julie Koesmarno (Website | @mssqlgirl), and we talked to a number of people in the SQL Server building. We discusses some issues, ongoing development within Azure (for both Roadmap as well as Microsoft), we talked to the tools-team (and I can now FINALLY talk about Project Carbon!), etc. Great times!

 
From cloud-first to cloud-only company
Over the last few weeks, we spend time on migrating the last on-premise (or private-cloud) assets into Azure. This concluded a long-awaited change, that we’ve been planning since I joined Roadmap (which is 2 years already tomorrow). This means we’ve changed from cloud-first company in 2016 to a cloud-only, Azure-only company in 2017.

This was a long-term goal we set ourselves around 2 years ago, and I can’t express how happy I am that we made this happen this year. By doing this we went from a very busy on-call rotation (where you were called about every single night), to an on-call rotation where you get incidental calls (my last on-call was 1 phone call on a Saturday morning).

I can honestly state that this brought us and the company forward in an incredible way. So also kudo’s to you Julie, for helping us out when we needed it!

 
On to 2018
So I don’t know what 2018 will bring me, but hopefully it’s going to be a great year. A few goals are on the list again, and hopefully I can tick some of those boxes next year (presenting on SQL Saturday(s) and events (so public speaking), attending PASS Summit 2018, etc.).

Hopefully you had a wonderful year, and together we can make 2018 even better.

Cheers!

Cleanup sysobjects after disabling replication

In my previous blog post I explained how we are going to leverage the power of transactional replication, to move your on-premise databases to Azure SQL databases (PaaS). This works like a charm, but you do end up with a bunch of system-generated objects in your database. So how do you get rid of these?

 
What “mess” is left behind after disabling replication
After you’ve completed your migration to Azure SQL DB, and cut the strings on your transactional replication, you end up with a bunch of system-generated objects. These are used by SQL Server to “replay” the changes on the subscriber (inserts, updates and deletes) that were made on the publisher. This is just an example of one of our smaller databases:

 
As you can see, there are 3 stored procedures per table (ins, upd, del), and some other objects used by replication.

Maybe it’s my IT-OCD talking, but it feels nice to clean this up, isn’t it?!

 
Cleanup
Fortunately the SQL Server team gave us a way to easily clean up these objects: sp_subscription_cleanup (that, even though MS Docs says it won’t, WILL work on Azure SQL DB).

This stored procedure accepts the following input:

@publisher: The name of the SQL Server instance of the publisher
@publisher_db: The name of the publishing database
@publication: The name of the publication (this was configured in the setup/configuration of replication)

By running this stored procedure on your subscriber, it cleans up all remaining transactional replication components. But where can you find the input for this procedure?

First I was looking at MSreplication_objects, but this doesn’t work on Azure SQL db (both MS DOCS says so, and I’ve tested it). So my guess was that all the other systemobject mentioned in related articles wouldn’t work as well.

So what do you do when you can’t find the answer yourself after extensive Googling/Bing-ing? Yes, you call in reinforcements (in this case William Durkin (Website | @sql_williamd), who knows a lot about replication):

 
But as you can tell by the reply on William’s response, I quickly found the answer by just trying something out: MSreplication_subscriptions. You can query this table on your subscriber to retrieve the publisher- and publication-properties. Again: even though MS DOCS says it won’t work, it does work on Azure SQL db.

So thanks to some “Rubber Ducking” I found the solution:

 
Thanks again for your mental support William!

So by using the output of MSreplication_subscriptions on your subscriber, you can quickly find the information needed to run sp_subscription_cleanup.

 
Automating the script with C#
Running the script on 1 or 2 databases can be done manually. But because I migrated an entire data platform with replication, we’re talking about A LOT of databases. So instead of doing this all manually (you need to connect to every single database to check this), I decided to create a small tool to help me with this.

In all honesty, my C#-skills aren’t that great, but my response to that is: “If it looks stupid but works it ain’t stupid”.

The tool connects to the instance you provided, returns a list of databasenames, and uses this list to connect to every database individually to check for replication components:

 
The tool can be used to connect to any Azure SQL db instance, as long as you have 1 login/user that can connect to all database on that server. If your server supports that (it depends on the configuration you use of course), you’ll see a list that provides you with the following information:

– Databasename
– Publisher
– Publisher_db
– Publication
– Distribution_agent
– CleanupScript

The last column is result of the following statement:

SELECT
	DB_NAME() AS Databasename,
	Publisher,
	Publisher_db,
	Publication,
	Distribution_agent,
	'EXEC sp_subscription_cleanup @publisher = ''' + Publisher + ''', @publisher_db = ''' + Publisher_db + ''', @publication = ''' + Publication + ''';' AS CleanupScript
FROM MSreplication_subscriptions

 
You can also use this script without downloading the tool.

But be careful, these could still be in use, so check that before you start cleaning up!

You can download the tool here (be aware, this is a .exe file!), or download the Source code here, and contact me if you would like me to share it via GitHub or something like that.

Send email on Azure Automation runbook failure

One of the downsides of using Azure Automation runbooks is the visibility of the runbook-outcomes. That’s one of the few downsides I could actually find when I wrote my “Running database maintenance on Azure SQL DB with Azure Automation” post. But because we execute the runbooks against our production environment, visibility is of high importance to us.

To get some more insights into this, I’ve implemented a try-catch block in every runbook we have, and send an email to our ITOps team when a runbook fails. So how did we do this?

 
Create an email runbook
To keep it a bit modular (and I know it could be set up even better), I’ve made a runbook that is only used to send an email to our IT Operations team. This email includes the runbook name and the error that occurred.

Because we are going to include sensitive account information in a Azure Automation runbook, it’s a wise idea to store the username and password of your Office365 account into a “Credential” in Azure Automation. This safely stores your account information, and allows you to use that in your runbook later. This option can be found in the Automation Account’s left menu.

Now that we’ve done that, let’s create a runbook that sends an email:

workflow EmailError
{
  Param
  (
    # Input parameters
    [Parameter (Mandatory = $true)]
    [string] $RunbookName,

    [Parameter (Mandatory = $true)]
    [string] $MessageBody
  )

    # RetrieveOffice 365 credential from Azure Automation Credentials
    $O365Credential = Get-AutomationPSCredential -Name "CredentialName"
    
    InlineScript
    {
        # Create new MailMessage
        $Message = New-Object System.Net.Mail.MailMessage
        
        # Set address-properties
        $Message.From = "no-reply@domain.com"
        $Message.replyTo = "no-reply@domain.com"
        $Message.To.Add("ITOPS@domain.com, support@domain.com")
  
        # Set email subject
        $Message.SubjectEncoding = ([System.Text.Encoding]::UTF8)
        $Message.Subject = "Failed job: $Using:RunbookName"
        
        # Set email body
        $Message.Body = "Error message: <br /><br /><br /><br /><br /> $Using:MessageBody"
        $Message.BodyEncoding = ([System.Text.Encoding]::UTF8)
        $Message.IsBodyHtml = $true
        
        # Create and set SMTP
        $SmtpClient = New-Object System.Net.Mail.SmtpClient 'smtp.office365.com', 587
        $SmtpClient.Credentials = $Using:O365Credential
        $SmtpClient.EnableSsl   = $true
  
        # Send email message
        $SmtpClient.Send($Message)
    }
}

 
This runbook sends an email from “no-reply@domain.com” to “ITOPS@domain.com”, and includes the input parameters (RunbookName and MessageBody) in the subject and body of the email. You can customize this based on your preferences.

 
Call the runbook
The way we use this runbook is by calling this from another runbook that runs our database maintenance. This runbook is copied from this post, except for the try-catch block.

By adding the $ErrorActionPreference and try-catch block, it’s possible to receive an email if a runbook fails:

workflow DBM_Servername_Databasename
{
    try
    {
        # Set error action preference
        $ErrorActionPreference = 'Stop'

        inlinescript
        {
                # Output status to console
                Write-Output "Creating connection..."

                # Create connection to database
                $DatabaseConnection = New-Object System.Data.SqlClient.SqlConnection
                $DatabaseConnection.ConnectionString = "Data Source=Servername.database.windows.net;Initial Catalog=DatabaseName;Integrated Security=False;User ID=UserName;Password=Password"
                $DatabaseConnection.Open()

                # Output status to console
                Write-Output "Creating command..."

                # Create command to execute
                $DatabaseCommand = New-Object System.Data.SqlClient.SqlCommand
                $DatabaseCommand.Connection = $DatabaseConnection
                $DatabaseCommand.CommandText = "EXEC dba.ExecuteMaintenance"
                $DatabaseCommand.CommandTimeout = 0;

                # Output status to console
                Write-Output "Executing payload..."

                # Execute the query
                $DatabaseCommand.ExecuteNonQuery()

                # Output status to console
                Write-Output "Closing connection..."

                # Close connection to database
                $DatabaseConnection.Close()
        }
    }
    catch
    {
        # Output status to console
        Write-Output "Catchie catchie..."
    
        # Set $errorMessage
        $errorMessage = $_
        
        # Call EmailError-runbook in order to send a failure-email
        EmailError `
		-RunbookName "DBM_Servername_Databasename" `
        -MessageBody $errorMessage

        # Output status to console
        Write-Output "Mail sent..."
    }
}

 
I’ve included a bunch of “Write-Output” lines in there, so that if I test it through the Azure portal I can see the status of the runbook. You can remove those if you want to have a clean runbook of course.

 
Conclusion
By using PowerShell to automate this type of work its easy to extend your jobs like we did. Adding an email component is just as easy as altering the payload in your runbook. This extensibility and modular approach is the way to go, if you want to deliver a solution that is easy to maintain and scalable.

And let me tell you something, that PowerShell is going to be big someday, mark my words! 😉

Schedule and run Azure Automation runbooks via Webhook

In one of my previous posts I wrote about Azure Automation jobs to run your Azure SQL DB maintenance processes. This is a very easy way to schedule your maintenance, but there are some shortcomings when you look at the scheduler options and job outcome visibility.

In this blog post I’ll show you how you can schedule your runbooks through webhooks. Then you’re not bound to the minimal schedule of once per hour, that is a limit set for Azure Automation schedules.

 
Start with a runbook and webhook
In this post I’m using the runbook that I previously created for running database maintenance. Before I created a webhook, I removed the schedule on the runbook itself. This is to prevent confusion on which schedule started the runbook.

Now that the basics are set, click on the “Webhooks” button, and click “Add Webhook” in the blade that opened:

 
A new blade opens up in the portal, and this blade allows you to create a new webhook. Give the webhook a name and an expiration date (which is mandatory, even though you don’t want it to expire), and copy the URL. You need to store this in a secure place, because it will only be visible in this windows while creating the webhook. It can’t be found anywhere else if you loose this URL:

 
The webhook URL will look like this (this specific URL is changed, and won’t work): “https:// s2events.azure-automation.net/webhooks?token=eT0%0bKvfBcMeusbZw00RAwP0ZKGqxEUnThBIHAEB0Eq0%0d”

 
The webhook is ready to use, so now we can move on to creating a schedule.

 
Create a webhook scheduler
Just by creating a webhook alone the job will never run. In order to do that, we need to create a “Scheduler Job Collection”. In the Azure portal you need to go to “Scheduler Job Collection”, click on “Add”, and give your Scheduler Job Collection a name. If you already have a Job collection you can add this one to it, or create a new one. Now go to “Action settings” to configure the webhook URL for this scheduler:

 
Make sure you set the action to “HTTPS”, and the method to “Post”.

 
Under “Schedule” you can configure the actual schedule. As mentioned earlier, the schedule of a Azure Automation runbook can only be configured in hours. The schedule of a Job Scheduler can be configured in minutes, so this allows you to run a job multiple times an hour in case you need it:

 
Conclusion
In some cases, it might be needed to run an Azure Automation runbook multiple times an hour, but that’s (currently) not supported for runbooks. By using the webhook and job scheduler as a work-around you’re able to make the runbooks just a bit more useful. Before using the job scheduler, we were required to duplicate runbooks and configure the schedule to run on specific times, but this only made things more unclear. This works out for us, and hopefully for you too!