Pragmatic SQL Server: July 2012

Over the course of using SQL Server for the years, I confess I have come up with myriad ways of producing those pesky aggregate reports that management so loves; such a report being the YTD sales of customers.

Some of the methods I have created have been to UNION multiple tables holding the data for different groups, building derived tables, doing the equivalent with CTE, and probably many others. I've tried using stored procedures as well, but invariably each request is an ad-hoc one, and the stored procedure approach led nowhere because of it.

Then came 'CUBE'. I have settled off on using this because it very easily and readily creates the information I need. (Now if Microsoft would only improve the PIVOT operation so that multiple aggregates could be handled... but I digress).

Here is an example of how to use a CUBE operation to simplify the summation of sales data for a report of customers' sales YTD.

First we need to build a customer table....

-- create some customers
if exists(select * from sysobjects where type = 'U' and name = 'customer')
drop table customer
;

SELECT '28549'[billing #], '28549'[customer #], cast('ABC Corporation' as nvarchar(255))[name], cast('123 Anywhere Place' as nvarchar(255))[Address],
cast('Irmo' as nvarchar(255))[City], 'SC'[St]
into customer;

insert into customer SELECT '32568'[billing #], '32568'[customer #], 'ABC Corporation'[name], '123 Anywhere Place'[Address], 'Irmo'[City], 'SC'[St];
insert into customer SELECT '46598'[billing #], '46598'[customer #], 'Hannibal Cartons'[name], '1235 Courtney Blvd.'[Address], 'Walterboro'[City], 'SC'[St];
insert into customer SELECT '84956'[billing #], '84956'[customer #], 'Quattle Container'[name], '3857 Cedar Road'[Address], 'Lexington'[City], 'SC'[St];
insert into customer SELECT '51324'[billing #], '51324'[customer #], 'Simmons Freightlines'[name], '456 Anywhere Place'[Address], 'Irmo'[City], 'SC'[St];

select * from customer
;

billing #	customer #	name	Address	City	St
28549	28549	ABC Corporation	123 Anywhere Place	Irmo	SC
32568	32568	ABC Corporation	123 Anywhere Place	Irmo	SC
46598	46598	Hannibal Cartons	1235 Courtney Blvd.	Walterboro	SC
84956	84956	Quattle Container	3857 Cedar Road	Lexington	SC
51324	51324	Simmons Freightlines	456 Anywhere Place	Irmo	SC

Next we need to build some invoice data. Being the pragmatic sort, since I don't need to provide breakdowns by items sold, I don't need the line data, so I'll be using data from the invoice header table only.

-- create some invoice headers
if exists(select * from sysobjects where type = 'U' and name = 'invoicehdr')
drop table invoicehdr
;

select '46598'[billing #], '46598'[customer #], cast('05-11-2012' as datetime)[invoice-date], 1256.55[gross], cast(1256.55*.05 as decimal(10,2))[tax] into invoicehdr
insert into invoicehdr select '28549'[billing #], '28549'[customer #], cast('06-30-2012' as datetime)[invoice-date], 125.36[gross], cast(125.36*.05 as decimal(10,2))[tax]
insert into invoicehdr select '28549'[billing #], '28549'[customer #], cast('07-12-2012' as datetime)[invoice-date], 326.10[gross], cast(326.10*.05 as decimal(10,2))[tax]
insert into invoicehdr select '28549'[billing #], '28549'[customer #], cast('07-28-2012' as datetime)[invoice-date], 42.10[gross], cast(42.10*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('01-21-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('03-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('04-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('05-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('06-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '32568'[billing #], '32568'[customer #], cast('07-19-2012' as datetime)[invoice-date], 442.30[gross], cast(442.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '46598'[billing #], '46598'[customer #], cast('01-19-2012' as datetime)[invoice-date], 546.30[gross], cast(546.30*.05 as decimal(10,2))[tax]
insert into invoicehdr select '46598'[billing #], '46598'[customer #], cast('03-10-2012' as datetime)[invoice-date], 342.20[gross], cast(342.20*.05 as decimal(10,2))[tax]
insert into invoicehdr select '51324'[billing #], '51324'[customer #], cast('03-14-2012' as datetime)[invoice-date], 156.89[gross], cast(156.89*.05 as decimal(10,2))[tax]
insert into invoicehdr select '51324'[billing #], '51324'[customer #], cast('04-23-2012' as datetime)[invoice-date], 423.65[gross], cast(423.65*.05 as decimal(10,2))[tax]

select * from invoicehdr
;

billing #	customer #	invoice-date	gross	tax
46598	46598	5/11/2012	1256.55	62.83
28549	28549	6/30/2012	125.36	6.27
28549	28549	7/12/2012	326.1	16.31
28549	28549	7/28/2012	42.1	2.11
32568	32568	1/21/2012	442.3	22.12
32568	32568	3/19/2012	442.3	22.12
32568	32568	4/19/2012	442.3	22.12
32568	32568	5/19/2012	442.3	22.12
32568	32568	6/19/2012	442.3	22.12
32568	32568	7/19/2012	442.3	22.12
46598	46598	1/19/2012	546.3	27.32
46598	46598	3/10/2012	342.2	17.11
51324	51324	3/14/2012	156.89	7.84
51324	51324	4/23/2012	423.65	21.18

Now we have a customer table and we have some invoice data for those customers; we are ready to create a sales ytd report.

Using the CUBE operation, we can easily and rapidly get our summary data as shown below:

select
case when grouping(i.[billing #])=1 then 'All' else [billing #] end [billing #],
case when grouping(i.[customer #])=1 then 'All' else [customer #] end [customer #],
sum(i.gross)gross,
sum(i.tax)tax
from invoicehdr as i with (nolock)
group by i.[billing #], i.[customer #]
with cube
order by [billing #], [customer #]
;

To use the 'with cube' operation, you need to use the GROUPING() statement on the columns you want to group by. The GROUPING statement will return a value of 1 for the group and a 0 for each element in the group. However, I like to case my GROUPING() statements to return the word, 'All' for the total and then the data for each element in the group. It just makes more sense to me. You can use whatever you choose, that's one of the points of being pragmatic.

The query returns the following answer set:

billing #	customer #	gross	tax
28549	28549	493.56	24.69
28549	All	493.56	24.69
32568	32568	2653.8	132.72
32568	All	2653.8	132.72
46598	46598	2145.05	107.26
46598	All	2145.05	107.26
51324	51324	580.54	29.02
51324	All	580.54	29.02
All	28549	493.56	24.69
All	32568	2653.8	132.72
All	46598	2145.05	107.26
All	51324	580.54	29.02
All	All	5872.95	293.69

Notice that for the billing # column I get a row back for each billing # in the invoice table, and I get an "All" row back for each data in the customer # column; plus 1 row where the customer # value is "All". Additionally, I get a row of data for each customer # and an "All" row for each customer #. These rows are basically the group total rows. I don't find them of much value, because I usually create our company's reports in SSRS. SSRS provides a grouping function in the tablix control to render group and sub-group totals so these rows only get in the way.

Here is how to filter them out easily... A good YTD report is going to show more data about the customer than their billing and customer numbers, so we want to perform a join of the CUBE query answer set and the customer table as shown below:

select cb.[billing #], cb.[customer #], c.[name], c.[Address], c.City, c.St, cb.gross, cb.tax
from
(
select
case when grouping(i.[billing #])=1 then 'All' else [billing #] end [billing #],
case when grouping(i.[customer #])=1 then 'All' else [customer #] end [customer #],
sum(i.gross)gross,
sum(i.tax)tax
from invoicehdr as i with (nolock)
group by i.[billing #], i.[customer #]
with cube
) cb
inner join customer as c with (nolock) on cb.[billing #]=c.[billing #] and cb.[customer #]=c.[customer #]
order by gross desc
;

Since I know there is no billing # or customer # data with the value "All" the inner join is going to filter out the group and subgroup total rows out of my cube. Since the billing and customer numbers are identical in this data set, and in the interest of having space on my post page for a tidy report, I'll eliminate the billing # from the answer set:

customer #	name	Address	City	St	gross	tax
32568	ABC Corporation	123 Anywhere Place	Irmo	SC	$ 2,653.80	$ 132.72
46598	Hannibal Cartons	1235 Courtney Blvd.	Walterboro	SC	$ 2,145.05	$ 107.26
51324	Simmons Freightlines	456 Anywhere Place	Irmo	SC	$ 580.54	$ 29.02
28549	ABC Corporation	123 Anywhere Place	Irmo	SC	$ 493.56	$ 24.69

Since no billing # matches the 'All' value in the cube data set, and likewise for the customer #, as I wrote above, these rows are eliminated from the report. All we have are the total sales year-to-date for the customers in our customer table.

This is so much simpler than any other method I've ever used. If you don't know about the CUBE and ROLLUP clauses you need to check them out. They're tremendous time savers (and head ache savers)!!

Last month I posted on a 'pragmatic' way to use internal tables to help build query statements dealing with large tables.

However, as INFORMATION_SCHEMA is the "new, ANSI standard" way to go, I thought I'd show that it can be used in a similar manner.

Also, I've often read posts where it is alluded that INFORMATION_SCHEMA should be used over sysinternals such as sysobjects and syscolumns.

I recently read this post from SQLServerCentral.com that shows this assertion is false at this time.
http://www.sqlservercentral.com/Forums/Topic1272370-1526-1.aspx#bm1277421

In the case of listing column names, either way is OK. However, you should be aware of when and why you choose to use either. There are some risks involved in always relying on INFORMATION_SCHEMA as Gail Shaw points out.

So, in the interest of preventing this site from going stale, even as I try to get on top of my project pile, I offer this short post to show that you can indeed continue to use this technique with the newer INFORMATION_SCHEMA.COLUMNS view to get a list of table column names in order of their position in the table.

Try this comparison to help you get comfortable with using the new "IS" views.

use dba -- < pick your own database containing large tables

-- pick a large table and substitute its name for 'nat'.

-- THIS IS THE OLD WAY
select name from syscolumns where id in (select id from sysobjects where name = 'nat') order by colorder
-- ANSWER SET
/*EventLog, RecordNumber, TimeGenerated, TimeWritten, EventID, EventType, EventTypeName, EventCategory, EventCategoryName, SourceName, Strings, ComputerName, SID, Message, Data*/

-- THIS IS THE NEW WAY
-- ANSWER SET
select column_name from information_schema.columns where table_name = 'nat' order by ordinal_position
/*EventLog, RecordNumber, TimeGenerated, TimeWritten, EventID, EventType, EventTypeName, EventCategory, EventCategoryName, SourceName, Strings, ComputerName, SID, Message, Data*/

Personally, I'm starting to like the new way better as I don't have to reference the INFORMATION_SCHEMA.tables in order to pull the data from INFORMATION_SCHEMA.columns.
The old way I have to use sysobjects to get the table id for the syscolumns to use.

You can use the INFORMATION_SCHEMA views in a number of interesting ways to help you reduce the time it takes to work with larger objects in your databases.

The following views apply ...

INFORMATION_SCHEMA.CHECK_CONSTRAINTS
INFORMATION_SCHEMA.COLUMN_DOMAIN_USAGE
INFORMATION_SCHEMA.COLUMN_PRIVILEGES
INFORMATION_SCHEMA.COLUMNS
INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
INFORMATION_SCHEMA.CONSTRAINT_TABLE_USAGE
INFORMATION_SCHEMA.DOMAIN_CONSTRAINTS
INFORMATION_SCHEMA.DOMAINS
INFORMATION_SCHEMA.KEY_COLUMN_USAGE
INFORMATION_SCHEMA.PARAMETERS
INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS
INFORMATION_SCHEMA.ROUTINE_COLUMNS
INFORMATION_SCHEMA.ROUTINES
INFORMATION_SCHEMA.SCHEMATA
INFORMATION_SCHEMA.TABLE_CONSTRAINTS
INFORMATION_SCHEMA.TABLE_PRIVILEGES
INFORMATION_SCHEMA.TABLES
INFORMATION_SCHEMA.VIEW_COLUMN_USAGE
INFORMATION_SCHEMA.VIEW_TABLE_USAGE
INFORMATION_SCHEMA.VIEWS

Take time to become familiar with the information these views provide. It can really help you increase your productivity writing T-SQL.

Pragmatic SQL Server

Monday, July 30, 2012

Those pesky aggregate reports - USING CUBE

Tuesday, July 24, 2012

Use INFORMATION_SCHEMA.COLUMNS to help build your INSERT, SELECT or other query

Thursday, July 12, 2012

What happened to the posts?