Monday, June 11, 2012

SQL Middle Earth - putting that hobbit id to pragmatic use


What's a hobbit in SQL Server?


A 'hobbit' is a HOBT (heap or btrieve) value.
There, doesn't that clear things up?


For more about what a HOBT is and why it was named thus, refer to 
http://thehobt.blogspot.com/2009/02/what-heck-is-sql-server-hobt-anyway.html.


I will hold forth on the usefulness I found for understanding what a hobbit is as follows, if you use
a commercial monitoring tool for SQL Server, (such as SQL Monitor from Red Gate Software), when your monitoring encounters a deadlock and you start to drill in you may notice a line such as the one below:




SPID Victim Lock details                         Statement type
138                      dbid: 5, object id: 72057594727628800, index id: (0100e420fa5a) UPDATE Line #: 1
169 Victim process dbid: 5, object id: 72057594727628800, index id: (0100e420fa5a) UPDATE Line #: 18


Now that object id, 72057594727628800, is not a real object id, you won't find it in the sysobjects table.  It is a hobbit object id.  When I click on the line for SPID 138, I find that the HOBT id is associated with object id 440335512, and when I click on the other line, I get 1644479766.


But I don't want to take a lot of time poking all the links and buttons in SQL Monitor (Oh, sometimes I do because I need to know the details).  But if your company is like the company I'm currently employed at, the applications running on SQL Server are most likely 3rd party software.  In other words, a) you don't have the source code; b) even if you did the vendor would take a dim view of you rummaging around in and worse, changing their code; c) you are too busy doing what your employer pays you to do, why should you work on the vendor's code for them?


So the most urgent thing for me is simply to find out as much as I can, as quickly as I can for the deadlock so  that I can open an issue on the Vendor's trouble ticket system and then they can go figure out why the code is prone to deadlocking.


Here's a nifty little bit of code that has helped me out in doing this:


USE [SERVICE] -- <-- your database name goes here


DECLARE @hobbit BIGINT
DECLARE @hobbitobj NVARCHAR(255)
DECLARE @hobbitidx INT


DECLARE @victimhobbit BIGINT
DECLARE @victimhobbitobj NVARCHAR(255)
DECLARE @victimhobbitidx INT


DECLARE @victimobject BIGINT
DECLARE @object BIGINT




SET @victimhobbit = 72057594727628800 
SET @victimobject = 1644479766
SET @hobbit = 72057594727628800 
SET @object = 440335512


SET @hobbitobj = (SELECT [object_id] FROM sys.partitions WHERE hobt_id = @hobbit)
SET @hobbitidx = (SELECT index_id FROM sys.[partitions] WHERE  hobt_id = @hobbit)
SET @victimhobbitobj = (SELECT [object_id] FROM sys.partitions WHERE hobt_id = @victimhobbit)
SET @victimhobbitidx = (SELECT index_id FROM sys.[partitions] WHERE  hobt_id = @victimhobbit)




SELECT 'victim'msg, hobt_id, object_name(p.[object_id]) [object], p.[object_id], index_id FROM sys.partitions p WHERE hobt_id = @victimhobbit
SELECT hobt_id, object_name(p.[object_id]) [object], p.[object_id], index_id FROM sys.partitions p WHERE hobt_id = @hobbit
SELECT 'victim'msg,[object_id], [name], [index_id], [type_desc] FROM sys.[indexes] AS i WHERE [object_id] = @hobbitobj AND [index_id]=@victimhobbitidx
SELECT [object_id], [name], [index_id], [type_desc] FROM sys.[indexes] AS i WHERE [object_id] = @hobbitobj AND [index_id]=@hobbitidx
SELECT 'victim'msg, name, id FROM sysobjects WHERE [id] = @victimobject
SELECT name, id FROM sysobjects WHERE [id] = @object


The results tell me about the name of the object associated with both 'hobbits', in this instance 2 SPIDS wanted access to the exact same object at the exact same time; moreover they were attempting to access the same object on the same index, and finally I have the name of the 2 stored procedures that were running.


So I can quickly copy this information into a ticket, the most important and useful part is that I can provide the vendor with the names of the stored procedures that deadlocked.


This has helped me by equipping the vendor with the information they need to start figuring out what they can do to reduce or eliminate deadlocks in their code.


Have fun!

Thursday, June 7, 2012

Use syscolumns to help build your INSERT, SELECT or other query

One thing I find tedious, especially when needing to write a long ETL script is creating the INSERT, SELECT or UPDATE statements for production tables with long rows.


Try this to speed things up.


USE <your databasename here>
SELECT '['+[name]+'], ' FROM syscolumns WHERE id IN (SELECT id FROM sysobjects WHERE name = 'order') ORDER BY [colorder];

(Of course you'll need to supply your own tablename to the statement above in place of 'order').

Results set should look something like this:

[Add-charges], [CC-Exp], [CC-Name], [CC-no], [Charge-cust], [Check-no], [Comm-code], 
[Comm-code1], [Comm-rate], [Comm-rate1], [div-code], [Credit-rejet], [Cust-no], 
[Cust-po], [Deposit1], [Extra-Chargeo__1], [Extra-Chargeo__2], [Extra-Chargeo__3], 
[Extra-Chargeo__4], [Extra-Chargeo__5], [Invoice-date], [Invoice-no], [Job-no], 
[Gross-amt], [Printed], [Order-code], [Order-date], [Order-no], [Back-order], [Pay-method], 
[Price-level], [Qty-open-ord], [Qty-orig-ord], [Qty-shipped], [quote-no], [Request-date], 
[Ship-address__1], [Ship-address__2], [ship-charge], [Ship-city], [Ship-country], [Ship-name], 
[Ship-no], [Ship-st], [Ship-zip], [Slspn-code], [Slspn-code1], [Tax-code], [Term-code], 
[Update-stat], [User-id], [Via-code], [Whs-code], [Memo__1], [Memo__2], [Tax-amt__1], 
[Tax-amt__2], [Tax-amt__3], [Taxable-amt__1], [Taxable-amt__2], [Taxable-amt__3], 
[Consig-PO], [ohu-count], [Deposit__1], [Deposit__2], [Deposit__3], [note__1], [note__2], 
[note__3], [note__4], [note__5], [disc], [org-date], [org-time], [pick-hold], [pick-printed], 
[opick-status], [hold-order], [ship-complete], [override-flag], [Order-status], [pick-stage], 
[ops-date], [stage-complete], [nb-labels], [alloc-qty], [alloc-value], [trans-weight], [Qty-on-pps], 
[last-pick], [priority], [field-1], [field-10], [field-2], [field-3], [field-4], [field-5], [field-6], [field-7], 
[field-8], [field-9], [add-taxable__1], [add-taxable__2], [add-taxable__3], [charge-freight-in], 
[charge-handle], [charge-freight-out], [sat-del], [Incent-code], [item-stat], [web-order], 
[hold-reason], [over-reason], [over-userid], [Extra-Charge__1], [Extra-Charge__2], 
[Extra-Charge__3], [Extra-Charge__4], [Extra-Charge__5], [Extra-Charge__6], 
[Extra-Charge__7], [Extra-Charge__8], [Extra-Charge__9], [Extra-Charge__10], 
[bottom-memo], [top-memo], [curr-code], [curr-factor], [cc-address], [cc-cvv], [cc-zip], 
[cc-duty], [cc-freight], [cc-tax], [cc-disc-amt], [cc-ship-cntry], [cc-frzip], [cc-pay-amt], 
[cc-shipzip], [cc-transdate], [cc-acthist], [CheckDate], [EmailId], [Last-Cntct], [last-cntct-phone],
[request-by], [adate], [atime], [QueueSequenceNumber], [cc-city], [cc-state], [cc-approval], 
[cc-approval-date], [RMA], [ApprovedValue], [CCTransactionType], [IsOpened], [OpenedUser], 
[InvoiceEmailIds], [location-no], [Drop-Ship], [HandlingCode], [HoldExtAmt1], [HoldExtAmt2], 
[HoldExtAmt3], [HoldExtAmt4], [HoldExtAmt5], [HoldExtAmt6], [HoldExtAmt7], [HoldExtAmt8], 
[HoldExtAmt9], [HoldExtAmt10], [lead-code], [price-pick], [creditcardreference], 
[PreAuthorization], [cc-cust-num], [lockuserid], [pounlockdate], [userlock], [DropShipVendor], 
[OrderReference], [WebOrderNo], 


Now it is a simple matter to trim the last comma from the answer set and copy it, prefix a copy with an 'INSERT INTO <target name>(' <paste here> and close with a ')' then on the next line type a 'SELECT' <paste here> then a 'FROM <source name>;' and you're done!

Using OUTPUT with SQL Server 2005 updates

One thing I wish I had on my 2005 servers that I have on my 2008 server is the MERGE command.  Since I don't I've settled on OUTPUT to come close.


Here is an example:


Run this script to provide the proof.

USE [DBA] -- <supply your own database sandbox name here>

DECLARE @update_id TABLE(id INT)

IF EXISTS(SELECT * FROM sys.[tables] AS t where type = 'u' and name = 'tbl_sales')
DROP TABLE [tbl_sales];

-- populate a table with data
SELECT 1 [id], 250.00[total sale], 'north'[region], 'original'[row operation] INTO [tbl_sales];
INSERT INTO [tbl_sales] SELECT 2, 350.00, 'south', 'original'
INSERT INTO [tbl_sales] SELECT 3, 50.00, 'east', 'original'
INSERT INTO [tbl_sales] SELECT 4, 475.00, 'west', 'original'

-- have a look a the 'data table'
SELECT *, 'initial table population complete'[msg] FROM tbl_sales ORDER BY [id];

IF EXISTS(SELECT * FROM sys.[tables] AS t where type = 'u' and name = 'stg_sales')
DROP TABLE [stg_sales];

-- populate a staging table
SELECT 1 [id], 350.00[total sale], 'north'[region], 'original'[row operation] INTO [stg_sales];
INSERT INTO [stg_sales] SELECT 3, 750.00, 'east', 'original'
INSERT INTO [stg_sales] SELECT 5, 500.00, 'east', 'original'
INSERT INTO [stg_sales] SELECT 6, 650.00, 'west', 'original'

-- have a look at your staging data
SELECT *, 'initial staging population complete'[msg] FROM stg_sales ORDER BY [id];

-- now update the table with staged data rows; some are for inserts (new); 
-- some for updates (changes to table from staged data)
BEGIN TRANSACTION
UPDATE [tbl_sales] WITH (ROWLOCK)
SET [total sale]=s.[total sale], [row operation]='updated'
OUTPUT INSERTED.[id] INTO @update_id
FROM [tbl_sales] AS t  
INNER JOIN [stg_sales] AS s ON t.[id]=s.[id]
;

INSERT INTO [tbl_sales]([id],[total sale],[region],[row operation])
SELECT * FROM [stg_sales] WHERE id NOT IN (SELECT id FROM @update_id)
;
COMMIT TRANSACTION

-- check the results
SELECT *, 'table update complete'[msg] FROM tbl_sales ORDER BY [id];
SELECT *, 'staging table rows'[msg]FROM stg_sales ORDER BY [id];

So what happened?
Using the OUTPUT clause, we capture the id of the rows that were updated from the staging table by virture of the INNER JOIN.  We saved those in @update_id table.

Then using the NOT IN CLAUSE in the SELECT statement portion of the INSERT operation following the update, we grabbed any rows in the staging table that did not have a matching id from the UPDATE operation (because they are new and don't exist in the table) and performed the insertion of the new rows.

In practice 'id' would be anything that uniquely identifies the data row - a customer_id, an order_number, an invoice_number, anything.  The data rows would of course be longer from production systems, but the concept is simple once mastered and is very useful for ETL script loads to data marts/data warehouse tables; or, a less structured operational data store.


Back in my Teradata Master days, we had an operation called UPSERT.  This is as close as it gets to UPSERT in SQL Server 2005, but I find it vastly simpler than just about any other approach I've used.