Copy data from Azure Data Lake to another Data Lake with AdlCopy

The last months it’s been a bit quiet on my blog. I started working on some new stuff, and couldn’t find the inspiration when it came to finding new subjects to blog about. I started working with Azure Data Lake a few months back, and I decided to share my (limited) knowledge here again, hoping it saves you time somewhere down the line.

Migrating data from one Data Lake to the other
We started out with a test version of a Data Lake, and this week I needed to migrate data to the production version of our Data Lake. After a lot of trial and error I couldn’t find a good way to migrate data. In the end I found a tool called AdlCopy. This is a command-line tool that copies files for you. Let me show you how easy it is.

 
Download & Install
AdlCopy needs to be installed on your machine. You can find the download here. By default the tool will install the files in “C:\Users\\Documents\AdlCopy\”, but this can be changed in the setup wizard.

Once you installed the tool, you can open a command prompt to use the tool:

 
Now you need to find the file or directory you want to copy. You can do this by opening the file location in the Azure portal, and click on “Folder properties”:

 
This URL will be the input for AdlCopy:

 
You should also find the destination URL for the other data lake, since this will be the target.

 
Linking it to your Azure subscription
With AdlCopy it’s not needed to link anything directly to your subscription, or configure anything. The first time you run a copy-command, a login box will pop up. If you login with the account you use to login to the Azure portal, the tool will be able to access your resources.

 
Copying data
The input for AdlCopy are “/Source” and “/Dest”. These represent the source data and the destination to copy the data to.

There are 2 options when you want to copy files: single file or entire directory:

Copy a single file:

AdlCopy /Source adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/<FILENAME>.json /Dest adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/<FILENAME>.json

 
Copy an entire dirctory:

AdlCopy /Source adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/ /Dest adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/

 
When you want to copy an entire directory, make sure you add the trailing “/” (slash) to the path. If you don’t do that, the copy will fail (you can’t copy a directory into a file).

 
Conclusion
After trying out some stuff with Data Factory, manually copying files and considering building a small C# tool, this was the quickest option. It works out of the box, and you don’t have to be a rocket scientist to get this to work. So: The perfect tool for the job!

Advertisements

Easy data encryption in Azure

This article was recently also published on dev.getroadmap.com:

 
 
 
 

For those of you who use Azure today, the security discussion must have been a thing on some occasion. Explaining to managers (and possibly colleagues) that Azure is a lot more secure than a(n) (on-premise) data center, and that Azure is easier to maintain and scalable. Trust me, we’ve all been there!

But besides the physical security, there’s also the digital security. In the world of today it’s easier to find a data-breach on the news, then it is to find an item about a bank robbery. So how can you secure your data in Azure in an easy but solid way, without the hassle of changing your applications?

Encryption could be one of your tools to achieve a secure infrastructure and/or applications. But encryption is a challenge for pretty much everyone. Almost every day we hear about companies not doing it right, or not doing it at all. But luckily, Azure helps us with setting this up with just the click of a button.

Okay, okay, you got me. Maybe a few button clicks…

 
Databases
For your Azure SQL databases, there’s a feature called “Transparent Data Encryption”, or TDE for short. This encrypts your data at rest with “FIPS 140-2 validated 256 bit AES encryption”. Or, in normal words: you encrypt your data with an AES-256 encryption key.

So how do you enable it? There are 2 ways to do so, but I’ll only show you the route via the Azure portal. Information on how to do this via T-SQL can be found here.

First, login to the Azure portal, and navigate to the database you want to encrypt. Click on “Transparent Data Encryption”, and just with a click of a button you can encrypt your data:

 
This will start the encryption process and, depending on the size of the database, after a while you’ll see that the data is encrypted:

 
This feature will allow you to encrypt your database, without any application changes. This is because the encryption and decryption is being handled in an “intermediate layer” by Azure. The data will be decrypted before returning it to the client, and the other way around it will be encrypted before it’s stored. So your applications will continue to work without any changes in the application-code or connectionstring(s) to the database(s).

 
Storage Accounts encryption
There is also an option to encrypt your Storage Accounts in the same way as TDE works for Azure SQL databases (without any application changes). When you enable this on your Storage Account, please remember that only the new data will be encrypted, and that the existing data won’t be encrypted until it changes. For more information on this, please read this article, and this MSDN thread.

When you’re creating a new Storage Account, you can choose to encrypt it right away:

 
But when you want to encrypt an existing Storage Account with data in it, you need to do it on 2 different levels (it’s a separate setting for BLOB and files):

 
This will encrypt your data with the same algrorithm as TDE for SQL Server will do: “All data is encrypted using 256-bit AES encryption, one of the strongest block ciphers available.” (source).

 
Conclusion
For us as a company, enabling this features means that all of our data is encrypted. We’re only sending and receiving data from within Azure, so the communication is also secure. And even though the majority of our data is public data (publicly available such as flight information, etc.), it’s a safe feeling to know that all our data is encrypted when stored.