Pragmatic SQL Server: SQL Server 2005

Showing posts with label SQL Server 2005. Show all posts

Tuesday, November 20, 2012

LOOPING VS. CURSORS

I just cannot say this enough - "Cursors are eeeeeeeeeeeeeeeviiiiilllllllllllllllllllllllll!!!!!!!!!!!!!!!!!!!"

I recently encountered this post in Spiceworks and joined in the conversation -

POST:
I believe I already know the answer to this, but I'd love to hear from some people in the community.

Lets say that we're writing a procedure that needs to perform a calculation based on each row of a table - the example I have below is impractical but shows my point. Just how bad is the performance of using a cursor to loop through the rows, versus a method such as below? Is the way I have outlined below the general best way to do a loop like this?

3 COLUMN EXAMPLE TABLE: rownumber identity PK, intData1, intData2

WHILE(@currentRowNumber <= @maxRowNumber)

BEGIN

INSERT INTO #SomeTable(myCalculation)

SELECT SUM(intData1 + intData2) FROM myTable WHERE PK = @currentRowNumber

SET @currentRowNumber = @currentRowNumber + 1

END

Interestingly, the first 'responder' posted this...
"I have recently been asked to remove cursors from existing stored procedures"

Now why would someone have instructed this person to remove cursors from all their existing stored procedures?

As I've previously posted, there are absolutely tons of opinions - some for, some against - the use of cursors; however, nothing speaks better than quantitative data. And to resolve this quantitatively just google 'Itzak Ben Gan' and 'cursor'. Once you've reviewed his articles, you should be clear on just why this responder would have been told to take cursors out of their stored procedure code.

To net it all out - there is hardly any situation you can craft where cursor will out-perform well crafted t-sql.
(There is an interesting exception, when declaring a static cursor one gets considerably improved performance over a plain old vanilla cursor).

That said, assuming you are operating on a table of say, 2,000,000 rows, or even a more modest table of 200,000 rows, figuring out a relational answer to the processing objective will almost always result in faster processing than 'fetching and operating' on one row at a time until you reach the end of the table. I have, on more than one occasion found myself starting to write a loop similar to the one above and while doing it, and writing the various operations to perform inside the loop, I suddenly see that I could just join tables and do a 'mass upsert' type of operation. In short order, I've taken my 'input batch table' and written two different queries - one joins the batch table to the target and inserts all the rows from the batch where the key in the batch is not in the target. Then I delete those rows from the batch. What is left are rows in the batch that have rows already existing in the target - therefore there must be something different about the data (or maybe not depending on who wrote the code that generated the 'input batch' table); so I write an update changing the value for everything in the target to what is in the batch's corresponding row (except for the key of course); joining on the key column(s).

And this is the point of my statement that cursor are evil. It isn't just the endless arguing about performance of the cursor itself, it is the nature of the beast.

The first time I ever saw a cursor, I thought to myself - "this is just a crutch that has been provided to people who are not accustomed to thinking relationally about data in an RDBMS." My reason for reaching that conclusion is the 'loop' model in the POST. Just about every cursor operation I've ever seen is the equivalent of some kind of loop operation; the loop is there to process each row (usually in a batch construct) and it just reeks of sequential thinking - 'first I'll do this to this piece of data in the row, then I'll do that to this piece of data, then blah, blah, blah ... and when I'm done I'll commit the change to this row and go get the next row until I've reach the 'eof' (end of table)'.

If you must do things this way, then by all means use a loop or a cursor. However, be prepared for the day when someone you report to looks at your code, considers all the overhead you're stuffing onto the servers and the implications of same to annual budgets and then be ready to strip all those cursors out (hint, start writing your code now that will go and find all the places where you've used a cursor in a stored procedure or job).

Why? Because crutches become the preferred way of doing things and pretty soon you are missing the whole point of a relational database construct. In effect you have taken a super-charged race car and tied 30 tons of dead weight to it, and then you'll be pondering why things take so long. "Where's the performance bottelnecks"? If the person you report to is any good at relational thinking, they'll know where the bottlenecks are - and they'll tell you to dump the cursors.

Lesson here - think hard about why you think you need that loop... because you probably don't.

Monday, November 19, 2012

when pivot isn't the answer...

Recently I helped someone out on Spiceworks, so I thought I might as well post the problem and the resolution here as well.

The requestor wrote: "Right now I get a separate row for each Diagnosis_Code with the same service_ID, I need them to appear as separate columns in one row for each service_ID. There are other columns in the output not listed here, so it cannot be a crosstab, unless that can be nested, I don't know."

The problem description went something like this:

I have a table Diagnosis_Codes with a list of codes as Diagnosis_Code, PK Diagnosis_Code_ID

I have a table Service_Diagnoses which has FK Service_IDs and FK Diagnosis_Code_IDs and Service_IDs

I have a table Services with Service_IDs

For one Service_ID, there can be multiple entries in Service_Diagnoses pointing to that Service_ID. Service_Diagnoses.Diagnosis_Code_ID is used to get a human usable definition for each entry in Service_Diagnoses. It is possible for a single Service_ID to have multiple (not more than 4) entries in Service_Diagnoses.

(Hope that is understandable)

Here is what I need to do:

For each service ID I must produce 1 row. In this row, I need the Service_ID, and the (up to 4) Diagnosis_Code's.

Service_ID | Diagnosis_Code1 | Diagnosis_Code2 | Diagnosis_Code3 | Diagnosis_Code4

Or, it can even be the Service_ID and the (up to 4) Service_Diagnoses.Diagnosis_Code_ID

------------------------------------------------------------------ end ---------------------------------------------------------------------------------

Now I admit, usually when someone posts something like this, I try to get a little more information. But after I looked at this one for a bit, it was clear enough that I could see what the requestor wanted, but the problem is a pivot isn't well-suited for his 'free-form' kind of result.

So I returned to a trusted approach that served me well before PIVOT came along, and turned out to be the solution to the requestor's needs (and got me the 'best answer' cookie)!

Here is most of that reponse (and all that you need to understand how to craft the solution to this problem!)

If I understand your tables, they would look something like these:

SELECT * FROM serviceid

serviceid

456

789

123

SELECT * FROM diagnosis

diagid diagcode

78 froze

12 burnt

23 dead

56 broke

SELECT * FROM servicediag

serviceid diagid

123 78

123 12

123 23

456 78

456 12

456 23

789 56

789 23

123 56

The first thing I like to do is a query that joins all the tables together and lets me see how the data looks when combined.

You can put it into a real table or a temp table, but the query below does the joins.

SELECT s.serviceid, sd.diagid, d.diagcode

INTO serviceidlist

FROM serviceid AS s

LEFT OUTER JOIN servicediag AS sd ON s.serviceid=sd.serviceid

LEFT OUTER JOIN diagnosis AS d ON sd.diagid=d.diagid

;

The resulting table contents can be seen using the following:

SELECT * FROM serviceidlist;

serviceid diagid diagcode

456 78 froze

456 12 burnt

456 23 dead

789 56 broke

789 23 dead

123 78 froze

123 12 burnt

123 23 dead

123 56 broke

But you want one line for service id, with a max of four columns of diagnostic code ids, so we need to apply a running sequence,

(1 - 4) for each serviceid.

SELECT * ,

rn=ROW_NUMBER() OVER (PARTITION BY serviceid ORDER BY diagid)

INTO serviceidorderedlist

FROM serviceidlist

;

Now the list looks like the one from the following query on the resulting table, "serviceidorderedlist".

SELECT * FROM serviceidorderedlist

;

serviceid diagid diagcode rn

123 12 burnt 1

123 23 dead 2

123 56 broke 3

123 78 froze 4

456 12 burnt 1

456 23 dead 2

456 78 froze 3

789 23 dead 1

789 56 broke 2

Now, all we need is a technique to consolidate all the diag code ids for the service id.

SELECT serviceid, SUM([diagid1])[diagid1], SUM([diagid2])[diagid2], SUM([diagid3])[diagid3], SUM([diagid4])[diagid4]

FROM

(

SELECT serviceid, diagid[diagid1], 0[diagid2], 0[diagid3], 0[diagid4] FROM serviceidorderedlist WHERE rn=1

UNION

SELECT serviceid, 0[diagid1], diagid[diagid2], 0[diagid3], 0[diagid4] FROM serviceidorderedlist WHERE rn=2

UNION

SELECT serviceid, 0[diagid1], 0[diagid2], diagid[diagid3], 0[diagid4] FROM serviceidorderedlist WHERE rn=3

UNION

SELECT serviceid, 0[diagid1], 0[diagid2], 0[diagid3], diagid[diagid4] FROM serviceidorderedlist WHERE rn=4

)dt

GROUP BY serviceid

;

The query above generates a line for each instance of a service id, up to 4 and sticks the appropriate diag code id into the correct column and 0 into the other columns. The union of all the queries in the derived table (dt) gives you up to 4 lines for each service id where only one column has a diag code and the other 3 have 0's; 1 row for each intersection of a service id and a diag code id. But there can still be up to 4 rows of a service id. So we 'wrap' the derived table query (dt) with another select that groups on the service id and sums the diag code columns. Since only on row in the set for any column has a diag code, adding 0 in the sum function has no effect on altering the data, and you get the result below

serviceid diagid1 diagid2 diagid3 diagid4

123 12 23 56 78

456 12 23 78 0

789 23 56 0 0

Monday, July 30, 2012

Those pesky aggregate reports - USING CUBE

Over the course of using SQL Server for the years, I confess I have come up with myriad ways of producing those pesky aggregate reports that management so loves; such a report being the YTD sales of customers.

Some of the methods I have created have been to UNION multiple tables holding the data for different groups, building derived tables, doing the equivalent with CTE, and probably many others. I've tried using stored procedures as well, but invariably each request is an ad-hoc one, and the stored procedure approach led nowhere because of it.

Then came 'CUBE'. I have settled off on using this because it very easily and readily creates the information I need. (Now if Microsoft would only improve the PIVOT operation so that multiple aggregates could be handled... but I digress).

Here is an example of how to use a CUBE operation to simplify the summation of sales data for a report of customers' sales YTD.

First we need to build a customer table....

-- create some customers
if exists(select * from sysobjects where type = 'U' and name = 'customer')
drop table customer
;

SELECT '28549'[billing #], '28549'[customer #], cast('ABC Corporation' as nvarchar(255))[name], cast('123 Anywhere Place' as nvarchar(255))[Address],
cast('Irmo' as nvarchar(255))[City], 'SC'[St]
into customer;

insert into customer SELECT '32568'[billing #], '32568'[customer #], 'ABC Corporation'[name], '123 Anywhere Place'[Address], 'Irmo'[City], 'SC'[St];
insert into customer SELECT '46598'[billing #], '46598'[customer #], 'Hannibal Cartons'[name], '1235 Courtney Blvd.'[Address], 'Walterboro'[City], 'SC'[St];
insert into customer SELECT '84956'[billing #], '84956'[customer #], 'Quattle Container'[name], '3857 Cedar Road'[Address], 'Lexington'[City], 'SC'[St];
insert into customer SELECT '51324'[billing #], '51324'[customer #], 'Simmons Freightlines'[name], '456 Anywhere Place'[Address], 'Irmo'[City], 'SC'[St];

select * from customer
;

billing #	customer #	name	Address	City	St
28549	28549	ABC Corporation	123 Anywhere Place	Irmo	SC
32568	32568	ABC Corporation	123 Anywhere Place	Irmo	SC
46598	46598	Hannibal Cartons	1235 Courtney Blvd.	Walterboro	SC
84956	84956	Quattle Container	3857 Cedar Road	Lexington	SC
51324	51324	Simmons Freightlines	456 Anywhere Place	Irmo	SC

Next we need to build some invoice data. Being the pragmatic sort, since I don't need to provide breakdowns by items sold, I don't need the line data, so I'll be using data from the invoice header table only.

-- create some invoice headers
if exists(select * from sysobjects where type = 'U' and name = 'invoicehdr')
drop table invoicehdr
;

select '46598'[billing #], '46598'[customer #], cast('05-11-2012' as datetime)[invoice-date], 1256.55[gross], cast(1256.55*.05 as decimal(10,2))[tax] into invoicehdr
insert into invoicehdr select '28549'[billing #], '28549'[customer #], cast('06-30-2012' as datetime)[invoice-date], 125.36[gross], cast(125.36*.05 as decimal(10,2))[tax]
insert into invoicehdr select '28549'[billing #], '28549'[customer #], cast('07-12-2012' as datetime)[invoice-date], 326.10[gross], cast(326.10*.05 as decimal(10,2))[tax]
insert into invoicehdr select '28549'[billing #], '28549'[customer #], cast('07-28-2012' as datetime)[invoice-date], 42.10[gross], cast(42.10*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('01-21-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('03-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('04-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('05-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('06-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('07-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '46598'[billing #], '46598'[customer #], cast('01-19-2012' as datetime)[invoice-date], 546.30[gross], cast(546.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '46598'[billing #], '46598'[customer #], cast('03-10-2012' as datetime)[invoice-date], 342.20[gross], cast(342.20*.05 as decimal(10,2))[tax]
insert into invoicehdr select '51324'[billing #], '51324'[customer #], cast('03-14-2012' as datetime)[invoice-date], 156.89[gross], cast(156.89*.05 as decimal(10,2))[tax]
insert into invoicehdr select '51324'[billing #], '51324'[customer #], cast('04-23-2012' as datetime)[invoice-date], 423.65[gross], cast(423.65*.05 as decimal(10,2))[tax]

select * from invoicehdr
;

billing #	customer #	invoice-date	gross	tax
46598	46598	5/11/2012	1256.55	62.83
28549	28549	6/30/2012	125.36	6.27
28549	28549	7/12/2012	326.1	16.31
28549	28549	7/28/2012	42.1	2.11
32568	32568	1/21/2012	442.3	22.12
32568	32568	3/19/2012	442.3	22.12
32568	32568	4/19/2012	442.3	22.12
32568	32568	5/19/2012	442.3	22.12
32568	32568	6/19/2012	442.3	22.12
32568	32568	7/19/2012	442.3	22.12
46598	46598	1/19/2012	546.3	27.32
46598	46598	3/10/2012	342.2	17.11
51324	51324	3/14/2012	156.89	7.84
51324	51324	4/23/2012	423.65	21.18

Now we have a customer table and we have some invoice data for those customers; we are ready to create a sales ytd report.

Using the CUBE operation, we can easily and rapidly get our summary data as shown below:

select
case when grouping(i.[billing #])=1 then 'All' else [billing #] end [billing #],
case when grouping(i.[customer #])=1 then 'All' else [customer #] end [customer #],
sum(i.gross)gross,
sum(i.tax)tax
from invoicehdr as i with (nolock)
group by i.[billing #], i.[customer #]
with cube
order by [billing #], [customer #]
;

To use the 'with cube' operation, you need to use the GROUPING() statement on the columns you want to group by. The GROUPING statement will return a value of 1 for the group and a 0 for each element in the group. However, I like to case my GROUPING() statements to return the word, 'All' for the total and then the data for each element in the group. It just makes more sense to me. You can use whatever you choose, that's one of the points of being pragmatic.

The query returns the following answer set:

billing #	customer #	gross	tax
28549	28549	493.56	24.69
28549	All	493.56	24.69
32568	32568	2653.8	132.72
32568	All	2653.8	132.72
46598	46598	2145.05	107.26
46598	All	2145.05	107.26
51324	51324	580.54	29.02
51324	All	580.54	29.02
All	28549	493.56	24.69
All	32568	2653.8	132.72
All	46598	2145.05	107.26
All	51324	580.54	29.02
All	All	5872.95	293.69

Notice that for the billing # column I get a row back for each billing # in the invoice table, and I get an "All" row back for each data in the customer # column; plus 1 row where the customer # value is "All". Additionally, I get a row of data for each customer # and an "All" row for each customer #. These rows are basically the group total rows. I don't find them of much value, because I usually create our company's reports in SSRS. SSRS provides a grouping function in the tablix control to render group and sub-group totals so these rows only get in the way.

Here is how to filter them out easily... A good YTD report is going to show more data about the customer than their billing and customer numbers, so we want to perform a join of the CUBE query answer set and the customer table as shown below:

select cb.[billing #], cb.[customer #], c.[name], c.[Address], c.City, c.St, cb.gross, cb.tax
from
(
select
case when grouping(i.[billing #])=1 then 'All' else [billing #] end [billing #],
case when grouping(i.[customer #])=1 then 'All' else [customer #] end [customer #],
sum(i.gross)gross,
sum(i.tax)tax
from invoicehdr as i with (nolock)
group by i.[billing #], i.[customer #]
with cube
) cb
inner join customer as c with (nolock) on cb.[billing #]=c.[billing #] and cb.[customer #]=c.[customer #]
order by gross desc
;

Since I know there is no billing # or customer # data with the value "All" the inner join is going to filter out the group and subgroup total rows out of my cube. Since the billing and customer numbers are identical in this data set, and in the interest of having space on my post page for a tidy report, I'll eliminate the billing # from the answer set:

customer #	name	Address	City	St	gross	tax
32568	ABC Corporation	123 Anywhere Place	Irmo	SC	$ 2,653.80	$ 132.72
46598	Hannibal Cartons	1235 Courtney Blvd.	Walterboro	SC	$ 2,145.05	$ 107.26
51324	Simmons Freightlines	456 Anywhere Place	Irmo	SC	$ 580.54	$ 29.02
28549	ABC Corporation	123 Anywhere Place	Irmo	SC	$ 493.56	$ 24.69

Since no billing # matches the 'All' value in the cube data set, and likewise for the customer #, as I wrote above, these rows are eliminated from the report. All we have are the total sales year-to-date for the customers in our customer table.

This is so much simpler than any other method I've ever used. If you don't know about the CUBE and ROLLUP clauses you need to check them out. They're tremendous time savers (and head ache savers)!!

Tuesday, July 24, 2012

Use INFORMATION_SCHEMA.COLUMNS to help build your INSERT, SELECT or other query

Last month I posted on a 'pragmatic' way to use internal tables to help build query statements dealing with large tables.

However, as INFORMATION_SCHEMA is the "new, ANSI standard" way to go, I thought I'd show that it can be used in a similar manner.

Also, I've often read posts where it is alluded that INFORMATION_SCHEMA should be used over sysinternals such as sysobjects and syscolumns.

I recently read this post from SQLServerCentral.com that shows this assertion is false at this time.
http://www.sqlservercentral.com/Forums/Topic1272370-1526-1.aspx#bm1277421

In the case of listing column names, either way is OK. However, you should be aware of when and why you choose to use either. There are some risks involved in always relying on INFORMATION_SCHEMA as Gail Shaw points out.

So, in the interest of preventing this site from going stale, even as I try to get on top of my project pile, I offer this short post to show that you can indeed continue to use this technique with the newer INFORMATION_SCHEMA.COLUMNS view to get a list of table column names in order of their position in the table.

Try this comparison to help you get comfortable with using the new "IS" views.

use dba -- < pick your own database containing large tables

-- pick a large table and substitute its name for 'nat'.

-- THIS IS THE OLD WAY
select name from syscolumns where id in (select id from sysobjects where name = 'nat') order by colorder
-- ANSWER SET
/*EventLog, RecordNumber, TimeGenerated, TimeWritten, EventID, EventType, EventTypeName, EventCategory, EventCategoryName, SourceName, Strings, ComputerName, SID, Message, Data*/

-- THIS IS THE NEW WAY
-- ANSWER SET
select column_name from information_schema.columns where table_name = 'nat' order by ordinal_position
/*EventLog, RecordNumber, TimeGenerated, TimeWritten, EventID, EventType, EventTypeName, EventCategory, EventCategoryName, SourceName, Strings, ComputerName, SID, Message, Data*/

Personally, I'm starting to like the new way better as I don't have to reference the INFORMATION_SCHEMA.tables in order to pull the data from INFORMATION_SCHEMA.columns.
The old way I have to use sysobjects to get the table id for the syscolumns to use.

You can use the INFORMATION_SCHEMA views in a number of interesting ways to help you reduce the time it takes to work with larger objects in your databases.

The following views apply ...

INFORMATION_SCHEMA.CHECK_CONSTRAINTS
INFORMATION_SCHEMA.COLUMN_DOMAIN_USAGE
INFORMATION_SCHEMA.COLUMN_PRIVILEGES
INFORMATION_SCHEMA.COLUMNS
INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
INFORMATION_SCHEMA.CONSTRAINT_TABLE_USAGE
INFORMATION_SCHEMA.DOMAIN_CONSTRAINTS
INFORMATION_SCHEMA.DOMAINS
INFORMATION_SCHEMA.KEY_COLUMN_USAGE
INFORMATION_SCHEMA.PARAMETERS
INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS
INFORMATION_SCHEMA.ROUTINE_COLUMNS
INFORMATION_SCHEMA.ROUTINES
INFORMATION_SCHEMA.SCHEMATA
INFORMATION_SCHEMA.TABLE_CONSTRAINTS
INFORMATION_SCHEMA.TABLE_PRIVILEGES
INFORMATION_SCHEMA.TABLES
INFORMATION_SCHEMA.VIEW_COLUMN_USAGE
INFORMATION_SCHEMA.VIEW_TABLE_USAGE
INFORMATION_SCHEMA.VIEWS

Take time to become familiar with the information these views provide. It can really help you increase your productivity writing T-SQL.

Monday, June 11, 2012

SQL Middle Earth - putting that hobbit id to pragmatic use

What's a hobbit in SQL Server?

A 'hobbit' is a HOBT (heap or btrieve) value.
There, doesn't that clear things up?

For more about what a HOBT is and why it was named thus, refer to
http://thehobt.blogspot.com/2009/02/what-heck-is-sql-server-hobt-anyway.html.

I will hold forth on the usefulness I found for understanding what a hobbit is as follows, if you use
a commercial monitoring tool for SQL Server, (such as SQL Monitor from Red Gate Software), when your monitoring encounters a deadlock and you start to drill in you may notice a line such as the one below:

SPID Victim Lock details Statement type
138 dbid: 5, object id: 72057594727628800, index id: (0100e420fa5a) UPDATE Line #: 1
169 Victim process dbid: 5, object id: 72057594727628800, index id: (0100e420fa5a) UPDATE Line #: 18

Now that object id, 72057594727628800, is not a real object id, you won't find it in the sysobjects table. It is a hobbit object id. When I click on the line for SPID 138, I find that the HOBT id is associated with object id 440335512, and when I click on the other line, I get 1644479766.

But I don't want to take a lot of time poking all the links and buttons in SQL Monitor (Oh, sometimes I do because I need to know the details). But if your company is like the company I'm currently employed at, the applications running on SQL Server are most likely 3rd party software. In other words, a) you don't have the source code; b) even if you did the vendor would take a dim view of you rummaging around in and worse, changing their code; c) you are too busy doing what your employer pays you to do, why should you work on the vendor's code for them?

So the most urgent thing for me is simply to find out as much as I can, as quickly as I can for the deadlock so that I can open an issue on the Vendor's trouble ticket system and then they can go figure out why the code is prone to deadlocking.

Here's a nifty little bit of code that has helped me out in doing this:

USE [SERVICE] -- <-- your database name goes here

DECLARE @hobbit BIGINT
DECLARE @hobbitobj NVARCHAR(255)
DECLARE @hobbitidx INT

DECLARE @victimhobbit BIGINT
DECLARE @victimhobbitobj NVARCHAR(255)
DECLARE @victimhobbitidx INT

DECLARE @victimobject BIGINT
DECLARE @object BIGINT

SET @victimhobbit = 72057594727628800
SET @victimobject = 1644479766
SET @hobbit = 72057594727628800
SET @object = 440335512

SET @hobbitobj = (SELECT [object_id] FROM sys.partitions WHERE hobt_id = @hobbit)
SET @hobbitidx = (SELECT index_id FROM sys.[partitions] WHERE hobt_id = @hobbit)
SET @victimhobbitobj = (SELECT [object_id] FROM sys.partitions WHERE hobt_id = @victimhobbit)
SET @victimhobbitidx = (SELECT index_id FROM sys.[partitions] WHERE hobt_id = @victimhobbit)

SELECT 'victim'msg, hobt_id, object_name(p.[object_id]) [object], p.[object_id], index_id FROM sys.partitions p WHERE hobt_id = @victimhobbit
SELECT hobt_id, object_name(p.[object_id]) [object], p.[object_id], index_id FROM sys.partitions p WHERE hobt_id = @hobbit
SELECT 'victim'msg,[object_id], [name], [index_id], [type_desc] FROM sys.[indexes] AS i WHERE [object_id] = @hobbitobj AND [index_id]=@victimhobbitidx
SELECT [object_id], [name], [index_id], [type_desc] FROM sys.[indexes] AS i WHERE [object_id] = @hobbitobj AND [index_id]=@hobbitidx
SELECT 'victim'msg, name, id FROM sysobjects WHERE [id] = @victimobject
SELECT name, id FROM sysobjects WHERE [id] = @object

The results tell me about the name of the object associated with both 'hobbits', in this instance 2 SPIDS wanted access to the exact same object at the exact same time; moreover they were attempting to access the same object on the same index, and finally I have the name of the 2 stored procedures that were running.

So I can quickly copy this information into a ticket, the most important and useful part is that I can provide the vendor with the names of the stored procedures that deadlocked.

This has helped me by equipping the vendor with the information they need to start figuring out what they can do to reduce or eliminate deadlocks in their code.

Have fun!

Thursday, June 7, 2012

Use syscolumns to help build your INSERT, SELECT or other query

One thing I find tedious, especially when needing to write a long ETL script is creating the INSERT, SELECT or UPDATE statements for production tables with long rows.

Try this to speed things up.

USE <your databasename here>
SELECT '['+[name]+'], ' FROM syscolumns WHERE id IN (SELECT id FROM sysobjects WHERE name = 'order') ORDER BY [colorder];

(Of course you'll need to supply your own tablename to the statement above in place of 'order').

Results set should look something like this:

[Add-charges], [CC-Exp], [CC-Name], [CC-no], [Charge-cust], [Check-no], [Comm-code],
[Comm-code1], [Comm-rate], [Comm-rate1], [div-code], [Credit-rejet], [Cust-no],
[Cust-po], [Deposit1], [Extra-Chargeo__1], [Extra-Chargeo__2], [Extra-Chargeo__3],
[Extra-Chargeo__4], [Extra-Chargeo__5], [Invoice-date], [Invoice-no], [Job-no],
[Gross-amt], [Printed], [Order-code], [Order-date], [Order-no], [Back-order], [Pay-method],
[Price-level], [Qty-open-ord], [Qty-orig-ord], [Qty-shipped], [quote-no], [Request-date],
[Ship-address__1], [Ship-address__2], [ship-charge], [Ship-city], [Ship-country], [Ship-name],
[Ship-no], [Ship-st], [Ship-zip], [Slspn-code], [Slspn-code1], [Tax-code], [Term-code],
[Update-stat], [User-id], [Via-code], [Whs-code], [Memo__1], [Memo__2], [Tax-amt__1],
[Tax-amt__2], [Tax-amt__3], [Taxable-amt__1], [Taxable-amt__2], [Taxable-amt__3],
[Consig-PO], [ohu-count], [Deposit__1], [Deposit__2], [Deposit__3], [note__1], [note__2],
[note__3], [note__4], [note__5], [disc], [org-date], [org-time], [pick-hold], [pick-printed],
[opick-status], [hold-order], [ship-complete], [override-flag], [Order-status], [pick-stage],
[ops-date], [stage-complete], [nb-labels], [alloc-qty], [alloc-value], [trans-weight], [Qty-on-pps],
[last-pick], [priority], [field-1], [field-10], [field-2], [field-3], [field-4], [field-5], [field-6], [field-7],
[field-8], [field-9], [add-taxable__1], [add-taxable__2], [add-taxable__3], [charge-freight-in],
[charge-handle], [charge-freight-out], [sat-del], [Incent-code], [item-stat], [web-order],
[hold-reason], [over-reason], [over-userid], [Extra-Charge__1], [Extra-Charge__2],
[Extra-Charge__3], [Extra-Charge__4], [Extra-Charge__5], [Extra-Charge__6],
[Extra-Charge__7], [Extra-Charge__8], [Extra-Charge__9], [Extra-Charge__10],
[bottom-memo], [top-memo], [curr-code], [curr-factor], [cc-address], [cc-cvv], [cc-zip],
[cc-duty], [cc-freight], [cc-tax], [cc-disc-amt], [cc-ship-cntry], [cc-frzip], [cc-pay-amt],
[cc-shipzip], [cc-transdate], [cc-acthist], [CheckDate], [EmailId], [Last-Cntct], [last-cntct-phone],
[request-by], [adate], [atime], [QueueSequenceNumber], [cc-city], [cc-state], [cc-approval],
[cc-approval-date], [RMA], [ApprovedValue], [CCTransactionType], [IsOpened], [OpenedUser],
[InvoiceEmailIds], [location-no], [Drop-Ship], [HandlingCode], [HoldExtAmt1], [HoldExtAmt2],
[HoldExtAmt3], [HoldExtAmt4], [HoldExtAmt5], [HoldExtAmt6], [HoldExtAmt7], [HoldExtAmt8],
[HoldExtAmt9], [HoldExtAmt10], [lead-code], [price-pick], [creditcardreference],
[PreAuthorization], [cc-cust-num], [lockuserid], [pounlockdate], [userlock], [DropShipVendor],
[OrderReference], [WebOrderNo],

Now it is a simple matter to trim the last comma from the answer set and copy it, prefix a copy with an 'INSERT INTO <target name>(' <paste here> and close with a ')' then on the next line type a 'SELECT' <paste here> then a 'FROM <source name>;' and you're done!

Using OUTPUT with SQL Server 2005 updates

One thing I wish I had on my 2005 servers that I have on my 2008 server is the MERGE command. Since I don't I've settled on OUTPUT to come close.

Here is an example:

Run this script to provide the proof.

USE [DBA] -- <supply your own database sandbox name here>

DECLARE @update_id TABLE(id INT)

IF EXISTS(SELECT * FROM sys.[tables] AS t where type = 'u' and name = 'tbl_sales')

DROP TABLE [tbl_sales];

-- populate a table with data

SELECT 1 [id], 250.00[total sale], 'north'[region], 'original'[row operation] INTO [tbl_sales];

INSERT INTO [tbl_sales] SELECT 2, 350.00, 'south', 'original'

INSERT INTO [tbl_sales] SELECT 3, 50.00, 'east', 'original'

INSERT INTO [tbl_sales] SELECT 4, 475.00, 'west', 'original'

-- have a look a the 'data table'

SELECT *, 'initial table population complete'[msg] FROM tbl_sales ORDER BY [id];

IF EXISTS(SELECT * FROM sys.[tables] AS t where type = 'u' and name = 'stg_sales')

DROP TABLE [stg_sales];

-- populate a staging table

SELECT 1 [id], 350.00[total sale], 'north'[region], 'original'[row operation] INTO [stg_sales];

INSERT INTO [stg_sales] SELECT 3, 750.00, 'east', 'original'

INSERT INTO [stg_sales] SELECT 5, 500.00, 'east', 'original'

INSERT INTO [stg_sales] SELECT 6, 650.00, 'west', 'original'

-- have a look at your staging data

SELECT *, 'initial staging population complete'[msg] FROM stg_sales ORDER BY [id];

-- now update the table with staged data rows; some are for inserts (new);

-- some for updates (changes to table from staged data)

BEGIN TRANSACTION

UPDATE [tbl_sales] WITH (ROWLOCK)

SET [total sale]=s.[total sale], [row operation]='updated'

OUTPUT INSERTED.[id] INTO @update_id

FROM [tbl_sales] AS t

INNER JOIN [stg_sales] AS s ON t.[id]=s.[id]

;

INSERT INTO [tbl_sales]([id],[total sale],[region],[row operation])

SELECT * FROM [stg_sales] WHERE id NOT IN (SELECT id FROM @update_id)

;

COMMIT TRANSACTION

-- check the results

SELECT *, 'table update complete'[msg] FROM tbl_sales ORDER BY [id];

SELECT *, 'staging table rows'[msg]FROM stg_sales ORDER BY [id];

So what happened?

Using the OUTPUT clause, we capture the id of the rows that were updated from the staging table by virture of the INNER JOIN. We saved those in @update_id table.

Then using the NOT IN CLAUSE in the SELECT statement portion of the INSERT operation following the update, we grabbed any rows in the staging table that did not have a matching id from the UPDATE operation (because they are new and don't exist in the table) and performed the insertion of the new rows.

In practice 'id' would be anything that uniquely identifies the data row - a customer_id, an order_number, an invoice_number, anything. The data rows would of course be longer from production systems, but the concept is simple once mastered and is very useful for ETL script loads to data marts/data warehouse tables; or, a less structured operational data store.

Back in my Teradata Master days, we had an operation called UPSERT. This is as close as it gets to UPSERT in SQL Server 2005, but I find it vastly simpler than just about any other approach I've used.