T-SQL Tuesday #37 – Join me in a Month of Joins
December 11, 2012 1 Comment
T-SQL Tuesday is a recurring blog party, that is started by Adam Machanic (Blog | @AdamMachanic). Each month a blog will host the party, and everyone that want’s to can write a blog about a specific subject.
When I read this months invitation by Sebastian, I thought about a situation last week. A developer asked me to review a bunch of T-SQL queries and procedures that one of the other colleagues wrote, because they didn’t work. The first thing that I noticed was the readability of the scripts. I’ve seen some bad ones in my life, but these were just monstrous!
Thinking about the situation I’m guessing it’s just a lack of knowledge about databases. They don’t see what the connection is between data sets. They don’t know what specific joins do, and so they used what they see everyone uses: just JOIN. But they don’t realize that this implicitly means INNER JOIN for the engine.
One of the biggest issues in the script was the fact that a MERGE was used instead of an UPDATE FROM. I’ll try to explain this by using an example of a car factory. The code is exactly the same as the code I was asked to debug, except the objects are renamed.
MERGE INTO Factory.dbo.Stock USING #TMP_NewDelivery D ON D.Brand = Factory.dbo.Stock.Brand WHEN MATCHED AND D.PartID = Factory.dbo.Stock.PartID THEN UPDATE SET DeliveredAmount = D.DeliveredAmount OUTPUT deleted.PartID $action, GETDATE(), inserted.PartID INTO @Logging
One of the issues with this code that I noticed first was the fact that it only updates values. So why didn’t they use an UPDATE FROM? This isn’t too bad, except the JOIN clause isn’t declared once, but twice: in the USING, and in the WHEN MATCHED part. The issue is that the ON clause is joined on Brand (let’s say Seat), and that the PartID is added at a later stage. In the end, the query started updating all rows with the same PartID (let’s say Engine). So instead of updating the stock of Seat Engines, it updated the stock for all Engine parts.
And unfortunately I couldn’t do anything with the logging data that is generated by the script. Instead of storing it in a table, the logging information wasn’t used at all. It was stored in a memory table (why they used a memory table, I don’t know), and this wasn’t saved into another object. So why use precious CPU cycles to “store” information you don’t use?
Looking at this reminded me of something I tend to forget: the database is often some side-track for developers. They quickly write a query that isn’t that good and isn’t that fast, but it does the trick. At least, that’s what they think!
Please start thinking in collections and sets if you work with SQL Server, and don’t use a cursor for everything. If you don’t know the difference between a LEFT, RIGHT and INNER JOIN , please ask for help. Your DBA or SQL developer won’t make fun of you for asking. And if you don’t ask for help, please don’t be mad if we use your code as an example! 😉