Thursday, June 7, 2012

Using OUTPUT with SQL Server 2005 updates

One thing I wish I had on my 2005 servers that I have on my 2008 server is the MERGE command.  Since I don't I've settled on OUTPUT to come close.


Here is an example:


Run this script to provide the proof.

USE [DBA] -- <supply your own database sandbox name here>

DECLARE @update_id TABLE(id INT)

IF EXISTS(SELECT * FROM sys.[tables] AS t where type = 'u' and name = 'tbl_sales')
DROP TABLE [tbl_sales];

-- populate a table with data
SELECT 1 [id], 250.00[total sale], 'north'[region], 'original'[row operation] INTO [tbl_sales];
INSERT INTO [tbl_sales] SELECT 2, 350.00, 'south', 'original'
INSERT INTO [tbl_sales] SELECT 3, 50.00, 'east', 'original'
INSERT INTO [tbl_sales] SELECT 4, 475.00, 'west', 'original'

-- have a look a the 'data table'
SELECT *, 'initial table population complete'[msg] FROM tbl_sales ORDER BY [id];

IF EXISTS(SELECT * FROM sys.[tables] AS t where type = 'u' and name = 'stg_sales')
DROP TABLE [stg_sales];

-- populate a staging table
SELECT 1 [id], 350.00[total sale], 'north'[region], 'original'[row operation] INTO [stg_sales];
INSERT INTO [stg_sales] SELECT 3, 750.00, 'east', 'original'
INSERT INTO [stg_sales] SELECT 5, 500.00, 'east', 'original'
INSERT INTO [stg_sales] SELECT 6, 650.00, 'west', 'original'

-- have a look at your staging data
SELECT *, 'initial staging population complete'[msg] FROM stg_sales ORDER BY [id];

-- now update the table with staged data rows; some are for inserts (new); 
-- some for updates (changes to table from staged data)
BEGIN TRANSACTION
UPDATE [tbl_sales] WITH (ROWLOCK)
SET [total sale]=s.[total sale], [row operation]='updated'
OUTPUT INSERTED.[id] INTO @update_id
FROM [tbl_sales] AS t  
INNER JOIN [stg_sales] AS s ON t.[id]=s.[id]
;

INSERT INTO [tbl_sales]([id],[total sale],[region],[row operation])
SELECT * FROM [stg_sales] WHERE id NOT IN (SELECT id FROM @update_id)
;
COMMIT TRANSACTION

-- check the results
SELECT *, 'table update complete'[msg] FROM tbl_sales ORDER BY [id];
SELECT *, 'staging table rows'[msg]FROM stg_sales ORDER BY [id];

So what happened?
Using the OUTPUT clause, we capture the id of the rows that were updated from the staging table by virture of the INNER JOIN.  We saved those in @update_id table.

Then using the NOT IN CLAUSE in the SELECT statement portion of the INSERT operation following the update, we grabbed any rows in the staging table that did not have a matching id from the UPDATE operation (because they are new and don't exist in the table) and performed the insertion of the new rows.

In practice 'id' would be anything that uniquely identifies the data row - a customer_id, an order_number, an invoice_number, anything.  The data rows would of course be longer from production systems, but the concept is simple once mastered and is very useful for ETL script loads to data marts/data warehouse tables; or, a less structured operational data store.


Back in my Teradata Master days, we had an operation called UPSERT.  This is as close as it gets to UPSERT in SQL Server 2005, but I find it vastly simpler than just about any other approach I've used.


No comments:

Post a Comment