Measuring the complexity of SQL statements

Question

Common measures of software complexity include Cyclomatic Complexity (a measure of how complicated the control flow is) and Halstead complexity (a measure of complex the arithmetic is).

The “control flow” in a SQL query is best related to “and” and “or” operators in query.

The “computational complexity” is best related to operators such as SUM or implicit JOINS.

Once you’ve decided how to categorize each unit of syntax of a SQL query as to whether it is “control flow” or “computation”, you can straightforwardly compute Cyclomatic or Halstead measures.

What the SQL optimizer does to queries I think is absolutely irrelevant. The purpose of complexity measures is to characterize how hard is to for a person to understand the query, not how how efficiently it can be evaluated.

Similarly, what the DDL says or whether views are involved or not shouldn’t be included in such complexity measures. The assumption behind these metrics is that the complexity of machinery inside a used-abstraction isn’t interesting when you simply invoke it, because presumably that abstraction does something well understood by the coder. This is why Halstead and Cyclomatic measures don’t include called subroutines in their counting, and I think you can make a good case that views and DDL information are those “invoked” abstractractions.

Finally, how perfectly right or how perfectly wrong these complexity numbers are doesn’t matter much, as long they reflect some truth about complexity and you can compare them relative to one another. That way you can choose which SQL fragments are the most complex, thus sort them all, and focus your testing attention on the most complicated ones.

Leave a Comment Cancel reply