Azure Data Lake – Register JSON Assemblies

The power of Azure Data Lake is that you can use a variety of different file types to process data (from Azure Data Lake Analytics). But in order to use JSON, you need to register some assemblies first.

Downloading assemblies
The assemblies are available on Github for download. Unfortunately you need to download the solution, and compile it on your machine. So I’ve also made the 2 DLL’s you need available via direct download:

Microsoft.Analytics.Samples.Formats.dll
Newtonsoft.Json.dll

 
Upload to ADL
Before we register the assemblies, we need to upload the files to Azure Data Lake storage. In my case, I created a folder called “Assemblies”, and in that folder a directory called “JSON”:

 
Now upload the 2 dll’s that you downloaded into that folder.

 
Register the assemblies
I’m running the register USQL job from Visual Studio, but you can also do this from the Azure portal, by running a USQL job in the Azure Data Lake Analytics window.

By running the statements below, you register both dll’s in your Azure Data Lake Analytics and you can start using JSON:

CREATE ASSEMBLY [Newtonsoft.Json] FROM "Assemblies/JSON/Newtonsoft.Json.dll";
CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM "Assemblies/JSON/Microsoft.Analytics.Samples.Formats.dll";

 
Conclusion
Because we use JSON as a primary way of creating, sending and storing data, being able to use this filetype in ADL is a must. This saves us time, because otherwise we would need to change the file to an intermediate type (like CSV or text) before we could process this data.

Hopefully this short tutorial helps you out as well.

Advertisements

Copy data from Azure Data Lake to another Data Lake with AdlCopy

The last months it’s been a bit quiet on my blog. I started working on some new stuff, and couldn’t find the inspiration when it came to finding new subjects to blog about. I started working with Azure Data Lake a few months back, and I decided to share my (limited) knowledge here again, hoping it saves you time somewhere down the line.

Migrating data from one Data Lake to the other
We started out with a test version of a Data Lake, and this week I needed to migrate data to the production version of our Data Lake. After a lot of trial and error I couldn’t find a good way to migrate data. In the end I found a tool called AdlCopy. This is a command-line tool that copies files for you. Let me show you how easy it is.

 
Download & Install
AdlCopy needs to be installed on your machine. You can find the download here. By default the tool will install the files in “C:\Users\\Documents\AdlCopy\”, but this can be changed in the setup wizard.

Once you installed the tool, you can open a command prompt to use the tool:

 
Now you need to find the file or directory you want to copy. You can do this by opening the file location in the Azure portal, and click on “Folder properties”:

 
This URL will be the input for AdlCopy:

 
You should also find the destination URL for the other data lake, since this will be the target.

 
Linking it to your Azure subscription
With AdlCopy it’s not needed to link anything directly to your subscription, or configure anything. The first time you run a copy-command, a login box will pop up. If you login with the account you use to login to the Azure portal, the tool will be able to access your resources.

 
Copying data
The input for AdlCopy are “/Source” and “/Dest”. These represent the source data and the destination to copy the data to.

There are 2 options when you want to copy files: single file or entire directory:

Copy a single file:

AdlCopy /Source adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/<FILENAME>.json /Dest adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/<FILENAME>.json

 
Copy an entire dirctory:

AdlCopy /Source adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/ /Dest adl://<DATA LAKE NAME>.azuredatalakestore.net/<DIRECTORY>/

 
When you want to copy an entire directory, make sure you add the trailing “/” (slash) to the path. If you don’t do that, the copy will fail (you can’t copy a directory into a file).

 
Conclusion
After trying out some stuff with Data Factory, manually copying files and considering building a small C# tool, this was the quickest option. It works out of the box, and you don’t have to be a rocket scientist to get this to work. So: The perfect tool for the job!