When to use GROUPING SETS, CUBE and ROLLUP

Question

Firstly, for those who haven’t already read up on the subject:

Using GROUP BY with ROLLUP, CUBE, and GROUPING SETS

That being said, don’t think about these grouping options as ways to get a result set. These are performance tools.

Let’s take ROLLUP as a simple example.

I can use the following query to get the count of records for each value of GrpCol.

SELECT   GrpCol, count(*) AS cnt
FROM     dbo.MyTable
GROUP BY GrpCol

And I can use the following query to summarily “roll up” the count of ALL records.

SELECT   NULL, count(*) AS cnt
FROM     dbo.MyTable

And I could UNION ALL the above two queries to get the exact same results I might get if I had written the first query with the ROLLUP clause (that’s why I put the NULL in there).

It might actually be more convenient for me to execute this as two different queries because then I have the grouped results separate from my totals. Why would I want my final total mixed right in to the rest of those results? The answer is that doing both together using the ROLLUP clause is more efficient. SQL Server will use an execution plan that calculates all of the aggregations together in one pass. Compare that to the UNION ALL example which would provide the exact same results but use a less efficient execution plan (two table scans instead of one).

Imagine an extreme example in which you are working on a data set so large that each scan of the data takes one whole hour. You have to provide totals on basically every possible dimension (way to slice) that data every day. Aha! I bet one of these grouping options is exactly what you need. If you save off the results of that one scan into a special schema layout, you will then be able to run reports for the rest of the day off the saved results.

So I’m basically saying that you’re working on a data warehouse project. For the rest of us it mostly falls into the “neat thing to know” category.

Leave a Comment Cancel reply