February 5, 2014 Leave a comment
From the first time that I started SSIS, I started to love it. In most cases it’s easy to create a package, easy to understand, and even readable for people who don’t “speak fluent SQL”. But what if you want to perform an easy task, and the result isn’t what you expect?
Formatting an Excel sheet
One of the most basic tasks you can create in SSIS, is importing an Excel sheet. Most of the time this works like a charm. But in my case, I wanted to filter out some rows from the workbook.
The business delivers an Excel sheet, that needs to be imported into the database. But because they don’t have the technical knowledge we have, they don’t know how important the format of the file is. They sent us this file (I’ve created a smaller sample, so it’s easier to read and understand):
The first thing you’ll notice as a data-professional is the 2 empty rows in the sheet. Beside that, we have an employee without a name. You know this is going to cause problems when you see it. These errors are easy to spot in the example, but imagine if these 2 rows are hidden in a dataset with 50.000 or more rows. So even though they might ended up there accidentally, your process is going to fail.
When you add an “Excel Source” to your package, and you look at the preview of that import, you immediately see the problem:
In order to determine what columns can be left blank, and what columns can’t be NULL, I looked at the table structure:
CREATE TABLE ResultSSIS (ID INT IDENTITY(1, 1), FullName VARCHAR(50) NOT NULL, Department VARCHAR(50) NULL, EmployeeNumber INT NOT NULL)
So in the dataset, FullName and EmpolyeeNumber are mandatory, and Department is optional. With this in mind, I started to work on a way to exclude those rows.
Import without excluding
The first thing I tried is to import the file, and see what the results are. Because I knew the data wasn’t correct, I didn’t want to import the Excel sheet into a SQL Server database just yet. So as a destination, I used the “recordset destination” control in SSIS. Importing the data into this memory table also allowed me to use the “data viewer” to see the imported data, without the need to truncate a table after each run. You can enable the “data viewer” by right-clicking the import-connector (the arrow between controls), and click “Enable Data Viewer”:
If you run the SSIS package in debugging mode, you’ll see the data that is imported in a pop-up window:
As you can see in the screenshot above, the records with NULL values in it are included in this import. So which records do we want to exclude, based on our table structure?
So from the 6 records in the Excel sheet, we want to exclude 3 in our import because of NULL values. But how do we do that? The easiest way to solve it, is to import it into a temp table, delete the NULL records, and insert the other records in the destination table. But what if that isn’t possible, and you want to filter the records in your import? I’ve chose to use the “Conditional Split”.
You don’t have to rebuild your whole package, when you want to exclude records with the “Conditional Split”. You can just add this control, at least in this case, in between you source file and your destination. If you open the control, you can add an expression that is used to filter records. In my case, I wanted to exclude the rows with an empty “FullName” and “EmployeeNumber”:
When connecting your “Conditional Split” to your destination, SSIS will ask you what output the “Conditional Split” needs to return. To output the entire set without the empty rows, chose the “Conditional Split Default Output”:
When you run your package with the extra “Conditional Split” (and you enable Data Viewer again), you’ll see the filtered output of the “Conditional Split”. The 3 NULL records are excluded like expected:
SSIS is easy to use, and yet a a really powerful tool. Even if you build your processes in SSIS, it’s not always necessary to rebuild your whole package. Sometimes you can save the day with just a minor change. That’s the power of SSIS!